Gene Transcript Profiling
mRNA-Seq is becoming the preferred technology for transcript profiling because of its high reproducibility, dynamic range and richness of data. Microarray expression profiling suffers from three key flaws, profiling is limited to only known genes/splice variants, hybridization artifacts and difficulty in reproducibility. Traditional digital gene expression techniques like serial analysis of gene expression (SAGE) generate information similar to mRNA-Seq but are limited by relatively high per-base costs of Sanger sequencing and need for bacterial cloning steps. A summary of the advantages is outlined below:
- High dynamic range – mRNA-Seq has been estimated to span 5 orders of magnitude, significantly higher than microarray platforms (Wang, Gerstein et al. 2009)
- High reproducibility – excellent concordance between identical samples has been observed, this reduces the need for technical replicates (Marioni, Mason et al. 2008)
- RNA splice patterns – characterization in relative abundance of splice variants is possible using paried-end reads (Trapnell, Pachter et al. 2009)
- SNP typing and allelic expression – transcripts in high quantities can be used to catalog patient haplotypes directly from RNA (Verlaan, Ge et al. 2009); this method has been extended to identify allelic imbalances in expression (Heap, Yang et al. 2010)
Detection of rare transcripts is limited only by sequencing depth. Early studies of 8M reads yielded novel insights (Sultan, Schulz et al. 2008), but sequencing technology has advanced significantly. An ultra-deep (10 gigabase) mouse transcript sequencing study revealed that 80M reads is sufficient to detect the vast majority of unique sequence tags; the most information is derived from the first 10 to 40M reads (Cloonan, Forrest et al. 2008) (Wang, Gerstein et al. 2009).
The genomics core HiSeq 2000 routinely produces 120-150M sequence reads per lane, and is capable of paired-end sequencing of up to 100nt in length. The genomics facility has a large amount of experience with mRNA-Seq with more than 200 samples characterized to date. As of March 2012, investigators are sequencing 3-4 samples per lane and achieving sufficient coverage for most mRNA-seq applications.
Library Preparation from RNA Samples
This technique selects polyadenylated mRNA transcripts from total RNA, fractionates then and then converts them to dsDNA for sequencing. The method is as follows: 1ug of high quality total RNA is incubated with oligo(dT) magnetic beads (SeraMag or Dynal) in order to enrich for mRNA with poly-A tails. The eluted RNA is incubated at 94°C in Tris buffer with potassium acetate and magnesium acetate; this yields fractionated RNA in the 200-500 nt range. The RNA is ethanol precipitated with sodium acetate, then resuspended in water. Reverse transcription with random oligonucleotide hexamers (Invitrogen SuperScript III) is performed to generate cDNA. Then, the RNA is degraded by addition of RNAse and DNA polymerase is added to generate a second strand. The DNA is then ready for standard Illumina adaptor ligation for sequencing (Mortazavi, Williams et al. 2008).
The Illumina platform employs an in-situ amplification technique followed by dye-terminator sequencing (Bentley, Balasubramanian et al. 2008). Short oligonucleotides covalently bound to the sequencing flow-cell are used to immobilize the DNA strand to be sequenced. Each molecule in the DNA library must contain two specific sequences at its ends; these are introduced by a DNA ligase. The method is as follows: dsDNA after second strand synthesis contains ends with 5´ and 3´ overhangs, these are filled-in using T4 polymerase and T4 polynucleotide kinase resulting in a blunt-ended DNA molecule. Then a deoxyadenosine (dA) 5´ tail is added to DNA strands using the Klenow fragment (exo-). Double stranded DNA adapters with 3´ thymidine overhangs are ligated to the dA tailed library using T4 ligase; the adapters contain the sequences needed for binding to the flow cell and sequencing primer binding sites.
The mRNA-Seq protocol comprises roughly two days of work, one for RNA preparation and the second for the library preparation. The first day entails: checking RNA quality with BioAnalyzer, selecting poly-A RNA with oligo dT beads, first and second strand synthesis with reverse transcriptase and finally DNA cleanup. The second day comprises: end-polishing of dsDNA, ligating adaptors, size selection by agarose gel/SPRI beads, enrichment of the library by PCR and validation of the library by BioAnalyzer.
RNA Sample Quality is Important
mRNA-Seq uses oligo dT beads to select from poly-adenylated RNA from a total RNA population. This results in a strong 3’ bias to sequencing reads because of fragmented RNA molecules. This can be mitigated in part by ensuring the RNA is of high quality before sequencing (strong 18S/28S peaks on BioAnalyzer). Downstream bioinformatic analysis attempts to correct for this bias when reads are counted and per-gene expression is computed. Reverse transcriptase may have varying efficiencies based on templates, this may also be a source of bias in read counts but is difficult to eliminate. Also, reverse transcriptase template switching can be a source of error; this is usually dealt with by software that identifies chimeric reads that have low counts and excludes them.
mRNA-Seq can capture pre-microRNAs that are polyadenylated; however it cannot detect cleaved microRNAs because of the oligo dT selection step. The size of sequencing adapter dimers is close to the size of true microRNA library sequences with ligated adapters, this leads to substantial difficulty in separation if total RNA is used. A separate microRNA specific sample preparation method has been developed by Illumina that differs substantially from their mRNA-Seq protocol. The method uses RNA-RNA ligation to add the adapter sequences, and then converts the sequences to DNA with reverse transcriptase for sequencing. To ensure a high yield for microRNA profiling, it is best to use the RNA-ligation method, thus microRNA-Seq should be carried out in parallel (though could potentially be sequenced in the same reaction with pooled libraries).