Multi-omics Data Integration Group | Icahn School of Medicine

The Center for Disease Neurogenomics (CDN) is generating an unprecedented amount of data across a broad spectrum of biological layers. The Multi-omics Data Integration Group, under the direction of Jaroslav Bendl, PhD, provides the expertise and methods to integrate disparate sources and types of data to create a comprehensive picture of molecular systems associated with neuropsychiatric and neurodegenerative diseases.

Integrating Multi-Layered Data From Disparate Sources and Methods

Our group is developing computational methods and databases to annotate and interpret human genomics, epigenomics, proteomics, and transcriptomics data to better understand neuropsychiatric and neurodegenerative diseases. Our team has expertise in combining individual omics data, in a sequential or simultaneous manner, to decipher molecular changes associated with disease pathophysiology. Notably, as a member of the AMP-AD and PsychENCODE consortia, we have led the computational analysis of epigenetic changes in the two largest cohorts of postmortem brains associated with Alzheimer’s disease, bipolar disease, and schizophrenia. The volume of the data we work with is extraordinary, however the access to the Mount Sinai supercomputer, Minerva, allows us to process the data with minimal delay.

Ensuring the Reliability of Data

We receive genetics and genomics data generated by a wide variety of molecular assays from both within and outside the Center, including members of other brain research consortia. To evaluate data quality and identify the sources of technical bias across molecular samples, we employ state-of-the-art techniques that can detect data abnormalities and apply various compensatory mechanisms. To increase the credibility of outputs produced by our pipelines, we compare them against similar published data sets and literature in an automated manner. Through the combination of all of these streams of data, we can identify those events that are triggering or contributing to the final phenotype or manifestation of the disease.

Our development of computational pipelines is often synchronized with other consortia members and follows the established standards in the field. This way, we can easily repurpose the existing code solutions for processing similar molecular assays and quickly integrate them into our discovery workflows.

Through our multi-pronged approach, we strive to make sure our results are reproducible and reliable. With the trustworthy integration of data from multiple sources, our goal is to capture a comprehensive picture of the molecular processes happening within diseased and control human brain cells.

Making Data Accessible Through Visualization Portals and Databases

In the spirit of collaboration with the broader scientific community, we create visualization portals and databases that provide an interactive view of multi-omics data sets. We generate data for different brain regions, cell types, and disease conditions. Our visualization portals facilitate multiple ways to graphically overlay various data sets. For instance, they enable a quick view of which genes are expressed in specific cell types, and a comparison between diseased and healthy controls. By making our data more accessible to the community researching neuropsychiatric and neurodiseases, we hope to accelerate understanding of the mechanisms of disease and, eventually, avenues of treatment.