Mount Sinai Center for Bioinformatics


Members of the Mount Sinai Center for Bioinformatics develop graduate-level courses both within the Graduate School of Biomedical Sciences and through Coursera. We also offer an annual summer research training program geared towards undergraduate and graduate students interested in participating in cutting edge research projects aimed at solving data-intensive biomedical problems. The major aim of our education and outreach activities is to engage the larger research community and train the next generation of Bioinformaticians and Biomedical Data Scientists.

Courses and Research Training Opportunities

We provide access to our Center’s resources through education, outreach, and training programs aimed at various scientific communities.

The BD2K-LINCS DCIC Summer Research Training Program in Biomedical Big Data Science is a research-intensive ten-week training program for undergraduate and graduate students interested in participating in cutting-edge research projects aimed at solving data-intensive biomedical problems. Summer fellows conduct faculty-mentored independent research projects within laboratories affiliated with the Center in the following areas: data integration, dynamic data visualization, machine learning, data harmonization, computational drug discovery, metadata and APIs, knowledge modeling, Bayesian networks, and statistical mining.

The benefits of participating in the training program are:

  • Direct research experience with projects aimed at solving data-intensive biomedical problems
  • A $6000 stipend for the ten-week training period
  • Interaction with the Center's computational experts through weekly meetings, enrichment lectures, and an e-poster session

We are looking for applicants who are:

  • U.S. citizen or a U.S. permanent resident
  • Undergraduate or master's student in good academic standing
  • Willing to work a minimum of 40 hours per week and take part in all program activities (e.g., weekly meetings, enrichment lectures, poster sessions, and presentations) in addition to mentored research work
  • Majoring in computer science, informatics, mathematics, statistics, physics, engineering, chemistry/chemical sciences, or biological sciences
  • Have an interest in Biomedical Big Data Science

We strongly encourage women and members of minority groups to apply.

Avi Ma’ayan, PhD, is the course director for two massive open online courses (MOOCs) on the Coursera platform. As of March 2016, more than 33,000 students registered for these two courses and they watched 195,000 video lectures.

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center

In this course, students organize, analyze, visualize, and integrate LINCS data with other publicly available relevant resources. In this course, we discuss the various centers that collect data for LINCS, looking at the experimental data procedures and data types. We then cover the design and collection of metadata and how metadata is linked to ontologies, followed by basic data processing and data normalization methods to clean and harmonize LINCS data. We examine how the data is served as RESTful APIs and JSON, which involves exploring concepts from client-server computing. Most importantly, the course focuses on various bioinformatics methods of analysis including: unsupervised clustering, gene-set enrichment analyses, Bayesian integration, network visualization, and supervised machine learning applications to LINCS data and other relevant Big Data from molecular biomedicine.

Network Analysis in Systems Biology

This MOOC is an introduction to the data integration and statistical methods used in contemporary systems biology, bioinformatics, and systems pharmacology research. The course covers methods to process raw data from genome-wide mRNA expression studies (microarrays and RNA-sequencing) including data normalization, differential expression, clustering, enrichment analysis, and network construction. We provide practical tutorials for using tools and setting up pipelines, and cover the mathematics behind the methods applied within the tools.

This course is mostly appropriate for beginning graduate students and advanced undergraduates majoring in fields such as biology, mathematics, physics, chemistry, computer science, and biomedical and electrical engineering. It would also be useful for researchers who encounter large datasets in their own research. The course presents software, applications, and tools developed by the Ma’ayan Laboratory as well as other freely available data analysis and visualization tools.

The aim of the course is to enable participants to use these methods for analyzing their own data for their own projects. For participants who do not work in the field, the course introduces the current research challenges in the field of computational systems biology.

We offer two graduate-level Big Data courses at the Graduate School of Biomedical Sciences.

BD2K-LINCS: Data Mining and Network Analysis

This course covers methods that include machine learning applications in systems biology including unsupervised clustering and supervised learning; analysis of the topology of biological regulatory networks; and a survey of how these approaches are applied to study biological molecular networks. Papers that combine computational predictions with experimental validation are highlighted; and we present the use of software tools to analyze proteomics and genomics data collected for the LINCS project.

Programming for Big Data Biomedicine

This class introduces computer programming methodologies applied in bioinformatics, systems biology, and complex systems theory. Topics covered include scripting, processing text files, converting data to figures with Python and R, building Agent-Based Models, as well as learning how to use web technologies such as Flask, Cloud Computing with AWS Database Scripting with SQL, JavaScript, React Native, Angular, and D3. Students complete small programming assignments throughout the course.