Data Management and Analysis

The HIMC computational analysis team leverages Mount Sinai’s High Performance Computing cluster, Minerva, as well as Amazon Web Services (AWS) to perform high throughput data processing and analysis. We have developed a range of efficient semi-automated data analysis pipelines that aim to facilitate analysis and interpretation of complex immunological datasets. For example, we have developed a CyTOF data processing and analysis pipeline that includes automated quality control and single and multi-sample data analysis using state-of-the-art algorithms such as Phenograph for community clustering, viSNE for dimensionality reduction, and association analytics using regularized regression to identify correlates with experimental or clinical features in the data. Our single-cell RNA-seq pipeline includes quality control and the 10X Genomics data processing pipeline. Our computational pipelines can be run in parallel on Minerva or on Amazon Web Services as Dockerized batch jobs, and the data securely can be shared with researchers via cloud based tools (e.g. AWS, GitHub, Mt. Sinai Box). We are also developing analysis pipelines (e.g. reproducible Jupyter notebooks) for Illumina and Olink proteomics datasets. The HIMC also supports an institutional enterprise Cytobank account; a cloud-based platform that facilitates storage, analysis, visualization and sharing of cytometric data sets.

The HIMC is also actively developing novel interactive web-based data sharing tools such as the CyTOF Data Viewer. The following example illustrates a CyTOF analysis of the immune infiltration in a hepatocellular carcinoma specimen following anti-PD1 therapy. The dataset can be used to interactively browse Phenograph clustering results and viSNE scatterplots to view immune populations and protein expression patterns that are differentially enriched in the tumor, border and adjacent liver tissue.