Researchers have gained new insights into the causes and characteristics of diseases by harnessing machine learning to analyse molecular patterns across hundreds of diseases simultaneously.
Demonstrating a new tool now available to researchers worldwide, the team of computer scientists and biologists has already uncovered and experimentally confirmed previously unknown contributions of four genes to a rare form of cancer that primarily affects babies and young children.
The team introduced the system and demonstrated its abilities in a paper published in the journal Cell Systems.
While previous approaches focused on genes associated with specific diseases or types of cancer, the new technique uses machine learning to find unique patterns of gene activity by looking at more than 300 different diseases simultaneously, including cancers, heart disease, metabolic disorders and many others.
In doing so, it reveals distinctions between diseases and tissue types, including fine-tuned differences between related diseases that were not possible to discern with other techniques.
The researchers believe that, with further development, the tool will be useful to clinicians in diagnosing disease, tailoring and tracking the effectiveness of therapies, and finding new treatment approaches.
The system, called Unveiling RNA Sample Annotation for Human Diseases, or URSA(HD), incorporates information about the activity of genes from publicly available records of about 8,000 biopsies taken from healthy and diseased tissues of thousands of patients.
Going forward, researchers may submit new samples to the tool, via a web interface and receive an analysis of possible associations with diseases and tissue types.
"The real innovation is comparing all samples to every other sample," said Chandra Theesfeld, one of the lead researchers.
Theesfeld likened the idea to the ability that humans have in the recognition of nuanced differences between behaviours based on having seen a wide variety of examples.
Watching soccer players, for example, might reveal the characteristics of a kicking action, but watching soccer players and ballet dancers at the same time reveals details and context for a similar action with a very different style and purpose.
"Studying them together provides a way to distinguish unique aspects," said Theesfeld. "That viewpoint provides an unbiased way to learn new things about the disease that aren't possible to find with the one-disease-at-a-time approach and potentially identify new targets for therapies or even discover new aspects of the disease that weren't appreciated."
In making its comparisons, the algorithm gives more weight to differences in gene activity that uniquely define the distinct tissues and diseases.
It de-emphasises information about gene activity common to related diseases, much of which already is well studied.
In the soccer-dancing analogy, it's like setting aside the large-scale action of lifting a leg in a kick and finding many details, such the angle of a foot, that taken together constitute a signature set of characteristics that reliably identify one action or the other.
"Our method is driven by the disease information in the patient sample, so it's not biased toward the popular disease genes that always get studied," Theesfeld added. "We can track patterns of changes in data without knowing exactly what each change means."
Theesfeld noted that 90 percent of studies of genes look at just 10 percent of human genes.
However, URSA(HD) looks at the entire human genome and creates a genome-wide model or signature for each disease.
This approach could be particularly powerful for rare diseases, for which the researchers can now create a model with just a few samples.
In the case of neuroblastoma (which is a common paediatric cancer), the researchers found four genes that particularly contributed to the disease and for which there was no previous information in the scientific literature.
To confirm the findings, Theesfeld performed laboratory tests on human cells, manipulating the gene activity and observing their effects on cancer-related processes in the cells.
Rather than looking at DNA itself, URSA(HD) looks at RNA, which is the product that cells create as they transcribe the information in DNA into working molecules that build and run cells and transmit signals from cell to cell.
In this way, the system looks beyond mutations (scrambling in the genes themselves) and instead focuses on the downstream transcription products, which can be dysregulated in ways that cause problems even if the original gene is normal.
The research is part of longstanding work in Troyanskaya's lab to integrate massive collections of dissimilar datasets to extract information necessary to make precise biological predictions and to direct laboratory experiments to accelerate discovery.
This work brings together computing and biology to develop foundational tools and insights with the potential to have a broad impact on health and humanity.
"Interdisciplinary approaches that merge sophisticated data science with deep knowledge of biology are key to deciphering biomedical puzzles necessary to realise the promise of precision medicine," said Troyanskaya.
Source: Princeton University, School of Engineering and Applied Science
We are an independent charity and are not backed by a large company or society. We raise every penny ourselves to improve the standards of cancer care through education. You can help us continue our work to address inequalities in cancer care by making a donation.
Any donation, however small, contributes directly towards the costs of creating and sharing free oncology education.
Together we can get better outcomes for patients by tackling global inequalities in access to the results of cancer research.
Thank you for your support.