TY - JOUR
T1 - Exhaustive prediction of disease susceptibility to coding base changes in the human genome
AU - Kulkarni, Vinayak
AU - Errami, Mounir
AU - Barber, Robert
AU - Garner, Harold R.
N1 - Funding Information:
We would like to thank Dr. John W Fondon III and Dr Guanghua Xiao, Dr Cristi Galindo, M. Mark Burkart and Dr Wayne Fisher, for discussion about this manuscript and Linda Gunn for administrative assistance. This work was supported by the P.O'B. Montgomery Distinguished Chair, the Hudson Foundation and the National Institute of Health/National Cancer Institute SPORE grant (50CA70907).
PY - 2008/8/12
Y1 - 2008/8/12
N2 - Background: Single Nucleotide Polymorphisms (SNPs) are the most abundant form of genomic variation and can cause phenotypic differences between individuals, including diseases. Bases are subject to various levels of selection pressure, reflected in their inter-species conservation. Results: We propose a method that is not dependant on transcription information to score each coding base in the human genome reflecting the disease probability associated with its mutation. Twelve factors likely to be associated with disease alleles were chosen as the input for a support vector machine prediction algorithm. The analysis yielded 83% sensitivity and 84% specificity in segregating disease like alleles as found in the Human Gene Mutation Database from non-disease like alleles as found in the Database of Single Nucleotide Polymorphisms. This algorithm was subsequently applied to each base within all known human genes, exhaustively confirming that interspecies conservation is the strongest factor for disease association. For each gene, the length normalized average disease potential score was calculated. Out of the 30 genes with the highest scores, 21 are directly associated with a disease. In contrast, out of the 30 genes with the lowest scores, only one is associated with a disease as found in published literature. The results strongly suggest that the highest scoring genes are enriched for those that might contribute to disease, if mutated. Conclusion: This method provides valuable information to researchers to identify sensitive positions in genes that have a high disease probability, enabling them to optimize experimental designs and interpret data emerging from genetic and epidemiological studies.
AB - Background: Single Nucleotide Polymorphisms (SNPs) are the most abundant form of genomic variation and can cause phenotypic differences between individuals, including diseases. Bases are subject to various levels of selection pressure, reflected in their inter-species conservation. Results: We propose a method that is not dependant on transcription information to score each coding base in the human genome reflecting the disease probability associated with its mutation. Twelve factors likely to be associated with disease alleles were chosen as the input for a support vector machine prediction algorithm. The analysis yielded 83% sensitivity and 84% specificity in segregating disease like alleles as found in the Human Gene Mutation Database from non-disease like alleles as found in the Database of Single Nucleotide Polymorphisms. This algorithm was subsequently applied to each base within all known human genes, exhaustively confirming that interspecies conservation is the strongest factor for disease association. For each gene, the length normalized average disease potential score was calculated. Out of the 30 genes with the highest scores, 21 are directly associated with a disease. In contrast, out of the 30 genes with the lowest scores, only one is associated with a disease as found in published literature. The results strongly suggest that the highest scoring genes are enriched for those that might contribute to disease, if mutated. Conclusion: This method provides valuable information to researchers to identify sensitive positions in genes that have a high disease probability, enabling them to optimize experimental designs and interpret data emerging from genetic and epidemiological studies.
UR - http://www.scopus.com/inward/record.url?scp=49649089565&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-9-S9-S3
DO - 10.1186/1471-2105-9-S9-S3
M3 - Article
C2 - 18793467
AN - SCOPUS:49649089565
SN - 1471-2105
VL - 9
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - SUPPL. 9
M1 - S3
ER -