Functional Knowledge Transfer for High-accuracy Prediction of Under-studied Biological Processes
Permanent link
https://hdl.handle.net/10037/6037Date
2013Type
Journal articleTidsskriftartikkel
Peer reviewed
Author
Park, Christopher Y.; Wong, Aaron K.; Greene, Casey S.; Rowland, Jessica; Guan, Yuanfang; Bongo, Lars Ailo; Burdine, Rebecca D.; Troyanskaya, OlgaAbstract
A key challenge in genetics is identifying the functional roles of genes in pathways. Numerous functional genomics
techniques (e.g. machine learning) that predict protein function have been developed to address this question. These
methods generally build from existing annotations of genes to pathways and thus are often unable to identify additional
genes participating in processes that are not already well studied. Many of these processes are well studied in some
organism, but not necessarily in an investigator’s organism of interest. Sequence-based search methods (e.g. BLAST) have
been used to transfer such annotation information between organisms. We demonstrate that functional genomics can
complement traditional sequence similarity to improve the transfer of gene annotations between organisms. Our method
transfers annotations only when functionally appropriate as determined by genomic data and can be used with any
prediction algorithm to combine transferred gene function knowledge with organism-specific high-throughput data to
enable accurate function prediction. We show that diverse state-of-art machine learning algorithms leveraging functional
knowledge transfer (FKT) dramatically improve their accuracy in predicting gene-pathway membership, particularly for
processes with little experimental knowledge in an organism. We also show that our method compares favorably to
annotation transfer by sequence similarity. Next, we deploy FKT with state-of-the-art SVM classifier to predict novel genes to
11,000 biological processes across six diverse organisms and expand the coverage of accurate function predictions to
processes that are often ignored because of a dearth of annotated genes in an organism. Finally, we perform in vivo
experimental investigation in Danio rerio and confirm the regulatory role of our top predicted novel gene, wnt5b, in leftward
cell migration during heart development. FKT is immediately applicable to many bioinformatics techniques and will help
biologists systematically integrate prior knowledge from diverse systems to direct targeted experiments in their organism of
study.
Publisher
Public Library of Science (PLoS)Citation
PLoS Computational Biology (2013), vol. 9(3): e1002957.Metadata
Show full item recordCollections
The following license file are associated with this item: