Application of Classification Prior Feature Selection Algorithm in Screening Metabonomic Data Variables of Hyperlipidemia
  
View Full Text    Download reader
DOI:
KeyWord:variable screening  unsupervised discriminative projection  classified prior information  non-linear  high-dimensional and small samples  metabonomics
  
AuthorInstitution
WANG Ya-ni,DU Li-jing,GUO Tuo,XIAO Xue 1. School of Electronic Information and Artificial Intelligence,Shaanxi University of Science and Technology, Xi’an ,China; 2. School of Pharmacy,Shanghai Jiao Tong University,Shanghai ,China; 3. Institute of Traditional Chinese Medicine,Guangdong Pharmaceutical University,Guangzhou ,China
Hits: 446
Download times: 1019
Abstract:
      Partial least squares discriminant analysis(PLS-DA) is currently a common method for biomarker screening in metabolomics research.However,it is often not ideal for finding the biomarkers in biomedicine,a class of complex non-linear research objects since it is a typical linear algorithm.Thus,a support vector machine approach based on unsupervised discriminative projection feature selection(UDPFS-SVM) is proposed in this paper.This method may be divided into two steps.The first step is to obtain the low-dimensional discriminant projection matrix.The UDPFS-SVM firstly introduces category prior information,then adding regularization and constraints such as penalty functions to obtain a discriminant projection matrix.Subsequently,the discriminant projection matrix is filtered by weights to become a low-dimensional discriminant projection matrix.The second step is to establish the support vector machine classification model.The UDPFS-SVM is used to build a support vector machine classification model based on the projection matrix to find biomarkers.It is worth mentioning that it is able to adaptively adjust the low-dimensional sparse projection matrix.Meanwhile, the UDPFS-SVM is able to perform both fuzzy and sparse learning,and it can also make reasonable use of the dependency relationships between variables.Therefore,it can handle non-linear research objects very well.In this paper,the metabolomic data of hyperlipidemic rats were screened for variables using the UDPFS-SVM and PLS-DA.And the biomarkers obtained from the screening were evaluated by variance analysis,ROC curves,and linear discriminant analysis(LDA).The results showed that eight biomarkers were identified by each of the two methods.Variance analysis showed that the numbers of significant biomarkers obtained by UDPFS-SVM were more than those of PLS-DA.Furthermore,the significant difference values obtained by UDPFS-SVM were all larger than those by PLS-DA.ROC curves results showed that the ROC value of UDPFS-SVM was significantly higher than that of PLS-DA.The ROC value of UDPFS-SVM is 1.00,which is 0.05 higher than that of PLS-DA.The results of LDA showed that biomarkers obtained by UDPFS-SVM could better eliminate the intra-group metabolic differences in hyperlipidaemic samples,and it could more significantly differentiate inter-group metabolic differences in hyperlipidaemic samples.In summary,the UDPFS-SVM is superior to PLS-DA in the discovery of biomarkers for hyperlipidemia.Therefore,UDPFS-SVM is a relatively ideal marker screening method for dealing with the complex non-linear research subject of biomedicine.It improves the accuracy of screening for markers in biomedicine,a non-linear research subject.This method offers a new way for biomarker discovery in the era of precision medicine.
Close