Mount Sinai Center for Bioinformatics


We, at the Mount Sinai Center for Bioinformatics, develop algorithms, pipelines, web-based software systems, and databases that enable experimental biologists to better analyze their data by unravelling the regulatory networks within mammalian cells. We use a variety of mathematical and computational methods such as machine learning and dimensionality reduction to organize data for further discovery and for making predictions.

We have received two U54 center grants from the National Institutes of Health that fund our research:

Our scientific goals for this project are to develop ways to:

  • Evaluate the quality of relevant computational and experimental methods by benchmarking
  • Develop methods to correct for inherent biases within different types of omics experiments
  • Determine how to integrate data from various datasets collected by different laboratories to increase knowledge extraction and address concerns regarding reproducibility
  • Understand how different layers of human cellular regulatory networks correlate and interact

Our technical goals for this grant are to:

  • Develop new standards for data annotation for large-scale experiments
  • Collect and prepare the largest possible set of annotated molecular cellular signatures for data reuse
  • Generate methods to connect cellular and organismal phenotypes with molecular cellular signatures
  • Create web-based data visualization methods to interact with large genomics and proteomics datasets
  • Produce educational materials and events for training the next generation of data scientists in molecular genomic and proteomic biomedicine

Learn more about the grant

The goal of this project is to collect, process, and serve attributes about druggable targets from four protein families: protein kinases, G-protein coupled receptors, nuclear receptors, and ion channels. We focus on genes and proteins that have not been studied extensively but have the potential to become useful drug targets.

We also collect, process, and serve data for all other mammalian genes and proteins; drugs small molecules, and other perturbagens; pheontypes, diseases, and side effects; and clinical genomics datasets from cohorts of patients. These processed data enable us to identify links between and across these various sources of information, and to encourage new discoveries. We develop and apply clustering and classification algorithms as well as workflow analyses to help predict the applicability of targeting these under-studied proteins for translational applications in personalized medicine.

Learn more about the grant

Visit the Harmonizome, a web-based system that serves all the data processed for this project