The mission of the Data Analysis and Integration Core D is to support the planning, analysis, integration, and interpretation of all multiomics data generated by the Skin Biology and Diseases Resource-Based Center (SBDRC) services. Specifically, Core D focuses on (i) consultation on experimental design and statistical considerations; (ii) data analysis; (iii) data integration across platforms and with publicly available datasets to generate meaningful insights and propose follow-up mechanistic experiments; (iv) providing learning opportunities to promote the next-generation of skin biologists; and (v) management of computational resources.
Core D Leadership
Director: Ernesto Guccione, PhD, Professor of Oncological Sciences and Co-Director of Bioinformatics for the Next Generation Sequencing (BiNGS) Core
Associate Director: Dan Hasson, PhD, Associate Professor of Oncological Sciences and Co-Director of the BiNGS Core
Core D Services
Transcriptomics
Bulk RNA-sequencing (seq): Initial analysis includes: evaluation of reads quality and alignment statistics, samples normalization using internal controls or computational methods, normalized read counts, assessment of sample similarity (principal component analysis PCA plot), differential gene expression, a link to a University of California Santa Cruz (UCSC) genome browser session for all normalized datasets, gene set enrichment analysis (GSEA), gene ontology (GO) terms and pathway enrichment analysis, and motif discovery. Custom analysis includes: gene expression modules, data integration (e.g., assay for transposase-accessible chromatin-sequencing or ATAC-seq and chromatin immunoprecipitation-sequencing or ChIP-seq), data integration with publicly available resources (e.g., ENCODE, Cancer Genome Atlas or TCGA), and publication quality figures.
Alternative Splicing Analysis: Initial analysis includes: evaluation of reads quality and alignment statistics, samples normalization using internal controls or computational methods, a link to a UCSC genome browser session for all normalized datasets, remote maintenance and testing system (rMATS) output report, quantification of differential splicing events (skipped exon, retained intron, A5’SS, A3’SS and mutually exclusive exon), GSEA, GO term, and pathway enrichment analysis. Custom analysis includes: Shapiro plots of 55’SS strength and motif, gene ontology terms and pathway analysis, and publication quality figures.
Alternative Promoter Analysis: Initial analysis includes: evaluation of reads quality and alignment statistics, samples normalization using internal controls or computational methods, a link to a UCSC genome browser session for all normalized datasets, assessment of sample similarity (PCA plot), heat map of promoter activity estimates, identification of alternative promoter usage across conditions, GSEA, and GO terms and pathway enrichment analysis. Custom analysis includes: gene expression modules, clustering, motif discovery, data integration (e.g., ATAC-seq, ChIP-seq), data integration with publicly available resources (e.g., ENCODE, TCGA), and publication quality figures.
Transcriptional Analysis of TCGA and other public datasets: Differential gene expression analysis (possibly in the context of specific mutation/cancer type), GO and pathway analysis, GSEA, clustering of samples through dimensionality reduction, gene signature analysis, survival correlations, clinical features correlations, and publication quality figures.
Single cell
scRNA-seq: Initial analysis includes: evaluation of reads quality and alignment statistics; linear dimensionality reduction via PCA; nonlinear dimensionality reduction via uniform manifold and approximation projection (UMAP) and t-distributed stochastic neighbor embedding (tSNE), unsupervised clustering of cells via Louvain algorithm, cell cycle scoring, diffusion maps, identification of conserved and differential biomarkers via Wilcox and ROC methods, functional enrichment via GSEA, GO terms and pathway enrichment analysis, scRNA-Seq dataset integration, cell type annotation via markers or label transfer, and interactive data exploration via cellxgene. Custom analysis includes: motif discovery, doublet/empty droplet detection, pseudo-time analysis, RNA velocity analysis, malignant/nonmalignant cell detection, data integration (e.g., scATAC-seq), data integration with publicly available resources, and publication quality figures.
scATAC-seq: Initial analysis includes: evaluation of reads quality, transcription start site enrichment and genomic region distribution, linear dimensionality reduction via LSI, nonlinear dimensionality reduction via uniform manifold approximation and projection and t-distributed stochastic neighbor embedding, unsupervised clustering of cells via smart local moving algorithm, UCSC genome browser session for Tn5 insertion signals grouped by cell cluster, diffusion maps, annotating peak to genes, identification of conserved and differential chromatin regions using logistic regression model, motif enrichment analysis in differential accessible regions, footprinting analysis, predicting motif activity per cell cluster via chromVar, quantifying gene activity using chromatin accessibility data, differential expression and functional enrichment analysis via over-representation analysis for gene activity data, and cell type annotation. Custom analysis includes: trajectory analysis via Monocle, cis co-accessibility via Cicero, data integration with scRNA-Seq or scMultiomics, data integration with publicly available resources, and publication quality figures.
scMulti-Omics: First, we analyze scRNA-seq and scATAC-seq separately (see above), and then together. Multiomics integration analysis includes: joined nonlinear dimensionality reduction via UMAP and tSNE, joined unsupervised clustering of cells via Louvain algorithm, cell cycle scoring, diffusion maps, identification of conserved and differential genes via Wilcox and receiver operating characteristic (ROC) methods on joined cluster, functional enrichment via GSEA, GO terms and pathway enrichment analysis, scRNA-Seq dataset integration, differential accessibility analysis on joined cluster using logistic regression model and motif enrichment test in differential accessible regions, motif activity analysis on joined cluster via chromVar, footprinting plot for selected transcription factor, and peaks to genes association. Custom analysis includes: integration with published dataset and trajectory analysis.
Spatial Transcriptomics: Initial analysis includes: evaluation of reads quality and alignment statistics, visualization of gene expression on spatial slides, unsupervised clustering of cells via Louvain algorithm, identification of spatially variable genes via mark variogram, functional enrichment via GSEA and via over-representation analysis (ORA), GO terms and pathway enrichment analysis, cell type annotation via markers or label transfer. Custom analysis includes: data integration (e.g., scRNA-seq), data integration with publicly available resources, and publication quality figures.
Epigenetics:
ChIP-seq, CUT&RUN and CUT&TAG: Initial analysis includes: evaluation of reads quality and alignment statistics, samples normalization using internal controls or computational methods, a link to a UCSC genome browser session for all normalized datasets, peak files for all significantly enriched regions, assessment of sample similarity (PCA plot), annotation of genomic distribution (e.g., promoters, gene bodies), GSEA, GO terms and pathway enrichment analysis, and motif discovery. Custom analysis includes: differential peaks across multiple samples, data integration (e.g., RNA-seq, ATAC-seq), characterization of chromatin states, enhancer and super enhancer identification and gene association, identification of alternative promoters, alignment to repetitive sequences and enrichment quantification data integration with publicly available resources (e.g., ENCODE, TCGA), and publication quality figures.
Bulk ATAC-Seq: Initial analysis includes: evaluation of reads quality and alignment statistics, samples normalization using computational methods, a link to a UCSC genome browser session for all normalized datasets, peak files for all significantly accessible regions, assessment of sample similarity (PCA plot), annotation of genomic distribution (e.g., promoters, gene bodies); GSEA, GO terms and pathway enrichment analysis, motif discovery. Custom analysis includes: differential peaks across multiple samples, quantification of differential accessibility across multiple samples, footprinting analysis, data integration (e.g., RNA-seq, ChIP-seq),association of intergenic accessible regions with genes, data integration with publicly available resources (e.g., ENCODE, TCGA), and publication quality figures.
HiC/HiChIP: Initial analysis includes: evaluation of reads quality and alignment statistics, a link to a UCSC genome browser session for all normalized datasets, loop calls for the significant interactions, compartments and topologically associating domains (TADs) calls. Custom analysis includes: differential loops across multiple samples, data integration (e.g., RNA-seq, ATAC-seq), loops associations with enhancer and super enhancer and gene promoters, and publication quality figures.
To learn more about services offered by Core D, please contact Dr. Hasson and/or Dr. Guccione.
See the SBDRC Core D presentation for further information.