Avi Ma'ayan, PhD
- PROFESSOR | Pharmacological Sciences
Research Topics:Addiction, Aging, Bioinformatics, Biostatistics, Cancer, Computational Biology, Drug Design and Discovery, Gene Expressions, Gene Regulation, Genetics, Genomics, Kidney, Mass Spectrometry, Mathematical Modeling of Biomedical Systems, Mathematical and Computational Biology, Personalized Medicine, Pharmacogenomics, Pharmacology, Protein Complexes, Protein Kinases, Proteomics, Reprogramming, Signal Transduction, Stem Cells, Systems Biology, Systems Pharmacology, Technology & Innovation, Theoretical Biology, Transcription Factors, Viruses and Virology
Dr. Ma'ayan is the Director of the Mount Sinai Center for Bioinformatics and a Professor in the Department of Pharmacological Sciences. Dr. Ma'ayan is also Principal Investigator of the NIH-funded BD2K-LINCS Data Coordination and Integration Center and Mount Sinai Knowledge Management Center for Illuminating the Druggable Genome. The Ma'ayan Laboratory applies computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. His research team applies machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation. The Ma'ayan Laboratory develops software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems.
Recently Released Software Tools Developed by the Ma'ayan Laboratory:
- Datasets2Tools: Repository and search engine for bioinformatics datasets, tools and canned analyses
- L1000FWD: Large-scale visualization of drug-induced transcriptomic signatures
- ARCHS4: All RNA-seq and CHIP-seq signature search space
- Enrichr: Gene-list enrichment analysis tool
- GEO2Enrichr: Browser extension to extract and analyze gene sets from GEO
- CREEDS: Crowd extracted expression of differential signatures
- GEN3VA: Gene expression and enrichment vector analyzer
- Harmonizome: A biological knowledge engine
- SEP-L1000: Side effect prediction based on L1000 data
- L1000CDS2: L1000 Characteristic Direction signature search engine
- Principal Angle Enrichment Analysis (PAEA): Dimensionally reduced multivariate gene set enrichment analysis tool
For a complete list of our software tools, databases and datasets, please visit our Resources page.
- Mount Sinai Health System Top 10 Researchers
- Mount Sinai Researchers Receive NIH Grant to Develop New Ways to Share and Reuse Research Data
- Students Harness Big Data to Help Solve Medical Challenges
- The FAIR Data-Sharing Movement: BD2K Centers Make Headway
- Gene Expression's Big Rethink
- The Druggable Genome is No Castle in the Air
- PODCAST: The Druggable Genome in Stereo
- BD2K Highlights
- Big (Data) Changes
- Big Data Highlight: The Harmonizome
- Establishment of the Mount Sinai Center for Bioinformatics
- Crowdsourcing for Scientific Discovery
- Genetics: Big Hopes for Big Data
- NIH Launches a United Ecosystem for Big Data
Multi-Disciplinary Training AreasBiophysics and Systems Pharmacology [BSP], Genetics and Genomic Sciences [GGS]
BSc, Fairleigh Dickinson University
MS, Fairleigh Dickinson University
PhD, Mount Sinai School of Medicine
Postdoctoral Fellowship, Mount Sinai School of Medicine
Irma T. Hirschl Career Scientist Award
Dr. Harold and Golden Lamport Research Award
Graduate School of Biological Sciences Award for Research Achievement
Doctoral Dissertation Award in the Graduate School of Biological Sciences
Systems Biology, Systems Pharmacology, Biomedical Big Data, Bioinformatics, Computational Biology, Data-Mining, Software Engineering, Network Analysis
Program Manager: Sherry Jenkins, MS
Research Faculty: Alexander Lachmann, PhD; Zichen Wang, PhD
Postdoctoral Fellow: Kathleen Jagodnik, PhD
Graduate Students: Alexandra Keenan, MS; Julia Zhao, ScM
Bioinformaticians and Software/Database Developers: Daniel Clarke, MS; Maxim Kuleshov, MS; Brian Schilder, MPhil; Moshe Silverstein, MA; Denis Torre, BS; Megan Wojciechowicz MS
2017 Undergraduate Research Trainees: Patrycja Krawczuk, Marina Latif, Joyce Lee, Ariel Leong, Damon Pham, Christopher Tseng, Lily Wang, Charlotte Zuber
Summary of Research Studies:
Advances in high-throughput experimental molecular biology are allowing us to elucidate the molecular mechanisms of mammalian cell regulation with ever-increasing detail. However, the potential gains from these advances are often not fully realized since high-throughput techniques often produce more data than our current ability to adequately organize, model and visualize. A particular challenge is encountered when attempting to integrate several high-dimensional datasets from multiple types of high- and low-throughput experimental techniques applied to study mammalian cells.
For the purpose of organizing, visualizing, analyzing and modeling data from such sources we develop computational approaches which can assist experimental systems-biologists to form rational hypotheses for further experimentation. We analyze high-dimensional data collected for projects integrating results from multiple layers of regulation (genomics, transcriptomics and proteomics). In addition to our research efforts, we also develop software so that our methodologies can reach and impact the Big Data biomedical research community. Below are some of the software tools we have developed:
1) Enrichr is a gene set enrichment analysis tool that includes one of the largest collections of annotated gene sets: over 229,000 gene sets organized into over 123 gene set libraries. Enrichr provides visualization of enrichment results as bar graphs, tables, canvases and networks. Enrichment is computed by three different methods and users can save and share their lists and results with a single click. Articles describing the initial and updated versions of the software were published in BMC Bioinformatics and Nucleic Acids Research. PMID: 23586463 and PMID: 27141961
3) L1000CDS2 and Drug Pair Seeker (DPS) are two tools that use the Connectivity Map gene expression datasets, including the new version that utilizes the L1000 technology, to predict single and pairs of drugs that can either mimic or reverse gene expression given signatures of differentially expressed genes. Both tools use novel algorithms developed by the Ma’ayan Laboratory to prioritize drugs and small molecules. A detailed description of Drug Pair Seeker and its application to kidney disease can be found in publication in the journal JSAN. PMID: 23559582. An article describing L1000CDS2 was published in npj Systems Biology and Applications. doi:10.1038/npjsba.2016.15
4) ChIP-X Enrichment Analysis (ChEA) database contains manually extracted datasets of transcription-factor/target-gene interactions from over 100 experiments such as ChIP-chip, ChIP-seq, ChIP-PET applied to mammalian cells. We use the database to analyze mRNA expression data where we perform gene-list enrichment analysis as the prior biological knowledge gene-list library. The system is delivered as web-based interactive software. With this software users can input lists of mammalian genes for which the program computes over-representation of transcription factor targets from the ChEA database. An article describing the system has been published in the journal Bioinformatics. PMID: 20709693
5) Kinase Enrichment Analysis (KEA) is a web-based tool with an underlying database providing users with the ability to link lists of mammalian proteins/genes with the kinases that phosphorylate them. The system draws from several available kinase–substrate databases to compute kinase enrichment probability based on the distribution of kinase–substrate proportions in the background kinase–substrate database compared with kinases found to be associated with an input list of genes/proteins. An article describing the system has been published in the journal Bioinformatics. PMID: 19176546
6) Expression2Kinases (X2K) is a software tool that integrates and upgrades the functionality of ChEA, Genes2Networks, KEA and Lists2Networks into one platform and computational pipeline. Given a list of differentially expressed genes, the software identified upstream transcription factors using the software and database ChEA; X2K then connects the top identified transcription factors with Genes2Networks using databases of known protein-protein interactions; the resultant subnetwork is then entered into KEA for kinase enrichment analysis. X2K also includes all the functions for enrichment analysis available within Lists2Networks. An article describing the system has been published in the journal Bioinformatics. PMID: 22080467
We apply these and other computational methods for the analysis of data from a variety of projects with our collaborators. The results from our analyses produce concrete suggestions and predictions for further functional experiments. The predictions are tested by our collaborators and our analyses methods are delivered as software tools and databases for the systems biology research community.
For more information, please visit the Ma'ayan Laboratory website.
Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma'ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nature Communications 2018 Apr; 9(1): 1366.
Torre D, Krawczuk P, Jagodnik KM, Lachmann A, Wang Z, Wang L, Kuleshov MV, Ma'ayan A. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses. Scientific Data 2018 Feb; 5(180023).
Koplev S, Lin K, Dohlman AB, Ma'ayan A. Integration of pan-cancer transcriptomics with RPPA proteomics reveals mechanisms of epithelial-mesenchymal transition. PLoS Computational Biology 2018 Jan; 14(1): e1005911.
Keenan AB, Jenkins SL, Jagodnik KM, Koplev S, He E, Torre D, Wang Z, Dohlman AB, Silverstein MC, Lachmann A, Kuleshov MV, Ma'ayan A, et al . The Library of Integrated Network-Based Cellular Signatures NIH program: System-level cataloging of human cells response to perturbations. Cell Systems 2018 Jan; 6(1): 13-24.
Wang Z, Li L, Glicksberg BS, Israel A, Dudley JT, Ma'ayan A. Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age. Journal of Biomedical Informatics 2017 Nov; S1532-0464((17)30240-X).
Fernandez NF, Gundersen GW, Rahman A, Grimes ML, Rikova K, Hornbeck P, Ma'ayan A. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific Data 2017 Oct; 4(170151).
Gundersen GW, Jagodnik KM, Woodland H, Fernandez NF, Sani K, Dohlman AB, Ung PM, Monteiro CD, Schlessinger A, Ma'ayan A. GEN3VA: aggregation and analysis of gene expression signatures from related studies. BMC Bioinformatics 2016 Nov; 17(1): 461.
Duan Q, Reid SP, Clark NR, Wang Z, Fernandez NF, Rouillard AD, Readhead B, Tritsch SR, Hodos R, Hafner M, Niepel M, Sorger PK, Dudley JT, Bavari S, Panchal RG, Ma'ayan A. L1000CDS2: LINCS L1000 characteristic direction signatures search engine. npj Systems Biology and Applications 2016 Aug; 2(16015).
Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database 2016 Jul; baw100.
Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott M, Gundersen GW, Ma'ayan A. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 2016 Jul; 44(W1): W90-97.
Gundersen GW, Jones MR, Rouillard AD, Kou Y, Monteiro CD, Feldmann AS, Hu KS, Ma'ayan A. GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions. Bioinformatics (Oxford, England) 2015 Sep; 31(18): 3060-3062.
Xu H, Ang YS, Sevilla A, Lemischka IR, Ma'ayan A. Construction and validation of a regulatory network for pluripotency and self-renewal of mouse embryonic stem cells. PLoS Computational Biology 2014 Aug; 10(8): e1003777.
Duan Q, Flynn C, Niepel M, Hafner M, Muhlich JL, Fernandez NF, Rouillard AD, Tan CM, Chen EY, Golub TR, Sorger PK, Subramanian A, Ma'ayan A. LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures. Nucleic Acids Research 2014 Jul; 42(W1): W449-460.
Jin Y, Ratnam K, Chuang PY, Fan Y, Zhong Y, Dai Y, Mazloom AR, Chen EY, D'Agati V, Xiong H, Ross MJ, Chen N, Ma'ayan A, He JC. A systems approach identifies HIPK2 as a key regulator of kidney fibrosis. Nature Medicine 2012 Mar; 18(4): 580-588.
Mazloom AR, Dannenfelser R, Clark NR, Grigoryan AV, Linder KM, Cardozo TJ, Bond JC, Boran AD, Iyengar R, Malovannaya A, Lanz RB, Ma'ayan A. Recovering protein-protein and domain-domain interactions from aggregation of IP-MS proteomics of coregulator complexes. PLoS Computational Biology 2011 Dec; 7(12): e1002319.
Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, Ma'ayan A. ChEA: Transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 2010 Oct; 26(19): 2438-2444.
Ma'ayan A, Cecchi GA, Wagner J, Rao AR, Iyengar R, Stolovitzky G. Ordered cyclic motifs contribute to dynamic stability in biological and engineered networks. Proc Natl Acad Sci U S A 2008 Dec; 105(49): 19235-19240.
Ma'ayan A, Jenkins SL, Neves S, Hasseldine A, Grace E, Dubin-Thaler B, Eungdamrong NJ, Weng G, Ram PT, Rice JJ, Kershenbaum A, Stolovitzky GA, Blitzer RD, Iyengar R. Formation of regulatory patterns during signal propagation in a Mammalian cellular network. Science 2005 Aug; 309(5737): 1078-1083.