Mount Sinai Center for Bioinformatics


We collaborate with researchers both within the Icahn School of Medicine at Mount Sinai and elsewhere by analyzing their data with the tools and pipelines we developed. In particular, the Center focuses on the strong need for analysis, visualization, and mining of data from omics studies such as transcriptomics, epigenomics, proteomics, and metabolomics for drug discovery.

Software Tools

We have developed several powerful and popular web-based software tools that can be used to discover new knowledge from data, and predict small molecules as novel leads, for a variety of projects involving different data types.

Gene-List Enrichment Analysis

This integrative web-based and mobile gene-list enrichment analysis tool includes more than 128 gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library Data-Driven Documents (D3). Enrichr is open source and freely available online. Users can easily embed this software into any tools that perform gene list analysis.

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool

Biological Knowledge Engine

This tool is a biological knowledge engine built on top of information about genes and proteins from 114 datasets. To create the Harmonizome, we distilled information from original datasets into attribute tables that define significant associations between genes and attributes, where attributes could be genes, proteins, cell lines, tissues, experimental perturbations, diseases, phenotypes, or drugs, depending on the dataset. Gene and protein identifiers were mapped to NCBI Entrez Gene Symbols and attributes were mapped to appropriate ontologies. We also computed gene-gene and attribute-attribute similarity networks from the attribute tables. These attribute tables and similarity networks can be integrated to perform many types of computational analyses for knowledge discovery and hypothesis generation.

The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins

All RNA-seq and CHIP-seq Signature Search Space

ARCHS4 provides access to gene counts from HiSeq 2000, HiSeq 2500 and NextSeq 500 platforms for human and mouse experiments from GEO and SRA. The website enables downloading of the data in H5 format for programmatic access as well as a 3-dimensional view of the sample and gene spaces. Search features allow browsing of the data by meta data annotation, ability to submit your own up and down gene sets, and explore matching samples enriched for annotated gene sets. Selected sample sets can be downloaded into a tab-separated text file through auto-generated R scripts for further analysis. Reads are aligned with Kallisto using a custom cloud computing platform. Human samples are aligned against the GRCh38 human reference genome, and mouse samples against the GRCm38 mouse reference genome.

Massive Mining of Publicly Available RNA-seq Data from Human and Mouse

L1000 Characteristic Direction Signature Search Engine

This tool finds consensus signatures that match a user’s input gene lists or input signatures. The underlying dataset is the LINCS L1000 small molecule expression profiles generated at the Broad Institute by the Connectivity Map team. We calculated the differentially expressed genes of these profiles using our multivariate method called the Characteristic Direction.

L1000CDS2: LINCS L1000 characteristic direction signatures search engine