The Charles Bronfman Institute for Personalized Medicine

BioMe FAQs For Researchers

What Types of Datasets Do We Have?
BioMe has a wide array of phenotypic and genetic data available for use by researchers, and includes a diverse cohort of individuals from many ancestral and cultural backgrounds. A questionnaire including information on medical history, demographics, and lifestyle is paired with a dynamic connection to each participant's electronic health record to provide information on many health-related outcomes and covariates. Available genetic data principally comprises genotypic array data (imputed to appropriate reference panels) and sequencing data (including whole exome sequence). Sample size for BioMe overall exceeds 50,000 individuals as of 2019, and genetic data is available for more than 30,000.

Available BioMe datasets with standard regulatory framework for data access (including e.g. ISMMS and NIH policies).
Genotypic data:
Affymetrix Genome-Wide Human SNP Array 6.0 - N = 2,710
OmniExpressExome Array - N = 9,222
Multi-Ethnic Genotyping Array - N = 12,754

Sequencing data:
Superpanel (targeted sequencing of 761 genes) - N = 22,598 
List of genes available in Superpanel

Available BioMe datasets with additional approval required (including standard regulations along with MSIP approval) 
Genotypic data: 
Global Screening Array - N = 31,705 

Sequencing data: 
Illumina v4 HiSeq 2500 (whole exome sequencing) - N = 30,813 

Additional BioMe data is also included in a number of consortia with more specific requirements for data access; please contact IPM for more information about these if you don't find a suitable dataset in the above list.

Do I need an IRB?
Please review NIH policies on human research subjects. You must apply to PPHS/IRB for determination of human subject research.

How do I get access for my non-human subjects (I already have my IRB waiver) project?
You must first be cleared by MSIP for potential COI with contractual restrictions on data use, sign a BioMe Data Use Agreement, then ensure you have or request a Minerva account. This process can be initiated here.  For Regeneron datasets, select Regeneron Data (create an account if you do not have one already) and complete a Data Request. For data from other sources, use the simpler Data Request found under the heading Other than Regeneron Data.

Here's the Regeneron data request process and what to expect:

Complete and submit a Data Request (requires that you upload your IRB protocol and approval letter, or letter stating the IRB's determination of non-human subjects research)

  • All individuals named in your Data Request will receive and must respond to email from MSIP regarding potential COI. Note: If your project involves an external collaborating party, the collaborating party will additionally need the approval of the Data Use Committee
  • Receive Data Use Agreements (generally, within 1-3 days after MSIP clearance) via email from DocuSign. All members of your project must individually sign a DUA
  • Request a Minerva allocation for your private secure Regeneron data workspace
  • Receive notification (generally, within 1-2 weeks after all DUA's are signed) from IPM that your team's access to the requested datasets on Minerva has been enabled, with information about directory/dataset names and how to access them
  • Provide your account number for chargeback of costs of provisioning the data. There is a minimum charge for 4 hours/project ($380 at 2019 rate) to provision datasets. Additional hours are charged if supplemental data preparation or consultation is needed.

How do I get access to Minerva?
Please request a High Performance Computing (Minerva) account if you do not have one already. To work with Regeneron data you will also need to request an allocation of on-line storage space for your rg_******* private directory. An allocation of 1TiB is recommended.

Does IPM provide biostatistical support?
Our goal is to make the tools for accessing BioMe data as self-service as possible. Other tools and BioInformatics expertise may be available within ISMMS. More information is available via the Digital Concierge Service provided by the Office of the Chief Research Informatics Officer. The Digital Concierge Service conducts weekly informatics and data science walk-in clinics each Monday from 9:30am to 10:30am at Icahn L2-15, with an option to participate via a “virtual room”: Or if you are unable to attend, you may submit a request ticket online at: IPM does engage in collaborative research with external investigators, so please submit an inquiry form if you want to propose a project idea: (If you are on the Mount Sinai network; off-campus users please email

What phenotypes does BioMe have?
BioMe has three major types of phenotype data. One is the questionnaire data which is standardized across BioMe participants. Second is the structured data within the EHR comprising of domains including laboratory values, diagnoses (ICD9/10 codes), procedures and orders (CPT codes), medications and vital signs among others. Finally, there is a list of ~50 validated phenotyping algorithms comprising of common diseases (e.g. Hypertension, T2D) that have been generated using a combination of data and undergone extensive curation and manual validation by clinicians. These likely represent true cases and controls since all have accuracy >90%. Other phenotype algorithms can be developed and validated based on request and in collaboration.

How do I call-back patients having a particular genotype identified in BioMe?
Individuals from BioMe can be recalled based on their genotype through the BioMe Phenomics Center (BPC). The BPC provides services to recontact BioMe participants for collection of biospecimens (DNA, plasma, PBMCs) and in depth phenotyping. Phenotyping tests include blood counts, lipid profiles, cardiovascular and eye tests, amongst other tests. Further information can be found at the BioMe Phenomics Center

What types of datasets can be downloaded from Minerva? 
Unless specifically authorized and designated, BioMe datasets may not be downloaded. Data access with downloading privileges should be discussed with the BioMe team.

What types of samples does BioMe have and what are the costs?

  • BioMe banks only DNA and plasma (not serum) from its consented participants
  • DNA aliquots are 75uL volume and range in concentration 30ng/uL-300ng/uL
  • Plasma aliquots are 100uL volume
  • The BioMe Biobank 2019 cost per aliquot (for Sinai internal clients and external, non-profit academic investigators) is $108.00/aliquot
  • The BioMe Biobank 2019 Scientific Consultancy Charge is $95.00/hour
  • The above-referenced rates are mandated by Mount Sinai Finance and Compliance and are subject to change based on the true-up of annual actual costs
Make a Gift