摘要: |
甲壳动物线粒体基因组蕴涵了物种进化历程中重要的遗传信息, 如何有效地利用这些保留在基因组中的基因序列和基因顺序信息, 是甲壳动物线粒体基因组研究的一个重点方向。为了进一步探讨甲壳动物稳定、可靠的系统发育关系, 本文利用支持向量机的分类功能实现了甲壳动物线粒体基因组基因区与基因间区、编码区与非编码区的准确分类和预测, 同时为了提高分类学习机的泛化能力, 使用了交叉验证方法和粒子群算法优化选取支持向量机相关训练参数。通过MATLAB 仿真分析的方法,对10 种甲壳动物线粒体基因组序列的基因区和基因间区进行分类, 以及对5 种甲壳动物进行线粒体基因组序列中编码区和非编码区的分类, 获得了较好的分类准确率。仿真结果表明本文方法是可行的和有效的, 能够出色地应用于甲壳动物线粒体基因组序列的研究分析。 |
关键词: 甲壳动物 线粒体 基因分析 支持向量机 交叉验证 粒子群算法 |
DOI:10.11759/hykx20130731001 |
分类号: |
基金项目:江苏省海洋资源开发研究院开放课题(JSIMR09C07);江苏省海洋生物技术重点建设实验室开放课题(2009HS12) |
|
Analysis method for crustacean mitochondrial gene based on SVM |
|
Abstract: |
Crustaceans mitochondrial genome contains important genetic information in the course of the species evolution, so it is a priority research direction for the crustacean mitochondrial genome to effectively use the gene sequence and the order information reserved in the genome. To further explore the phylogenetic relationship of stablility and reliablility of the crustaceans, the classification function of the support vector machine was used to realize the accurate classification and prediction of both the gene region with gene intergenic region and coding region with non-coding region in the crustacean mitochondrial genome. In addition, in order to improve the generalization ability of the classification learning machine, the cross-validation method and particle swarm optimization algorithm were selected to optimize the training parameters of support vector machine. Through the method of simulation analysis in MATLAB, a better classification accuracy is obtained between the gene region and gene intergenic region of 10 species of crustaceans, and the excellent result is also gained between the coding region and non-coding region of 5 kinds of crustaceans. The simulation result shows that this method is feasible and effective and it can be well used to investigate and analyze the mitochondrial genome of crustaceans. |
Key words: crustacean mitochondria gene analysis support vector machine cross validation particle swarm optimization |