Sparse group selection and analysis of function-related residue for protein-state recognition

Fangyun Bai, Kin Ming Puk, Jin Liu, Hongyu Zhou, Peng Tao, Wenyong Zhou, Shouyi Wang

Research output: Contribution to journalArticlepeer-review


Machine learning methods have helped to advance wide range of scientific and technological field in recent years, including computational chemistry. As the chemical systems could become complex with high dimension, feature selection could be critical but challenging to develop reliable machine learning based prediction models, especially for proteins as bio-macromolecules. In this study, we applied sparse group lasso (SGL) method as a general feature selection method to develop classification model for an allosteric protein in different functional states. This results into a much improved model with comparable accuracy (Acc) and only 28 selected features comparing to 289 selected features from a previous study. The Acc achieves 91.50% with 1936 selected feature, which is far higher than that of baseline methods. In addition, grouping protein amino acids into secondary structures provides additional interpretability of the selected features. The selected features are verified as associated with key allosteric residues through comparison with both experimental and computational works about the model protein, and demonstrate the effectiveness and necessity of applying rigorous feature selection and evaluation methods on complex chemical systems.

Original languageEnglish
Pages (from-to)1342-1354
Number of pages13
JournalJournal of Computational Chemistry
Issue number20
StatePublished - 30 Jul 2022


  • classification
  • feature selection
  • function-related residues
  • protein states
  • sparse group lasso


Dive into the research topics of 'Sparse group selection and analysis of function-related residue for protein-state recognition'. Together they form a unique fingerprint.

Cite this