Pacific Biosciences Single Molecule Sequencing
Pacific Biosciences Real Time Sequencer
Sequencing Library Preparation
The Pacific Biosciences (PacBio) platform requires all DNA libraries to contain a common sequence on both ends, this allows the polymerase to bind and initiate sequencing. Large molecular weight DNA must first be sheared; the size may be from 250bp to 6kb depending on application. The PacBio adapters are short oligonucleotide strands which form a hairpin structure. These are ligated to the DNA fragments using standard molecular biology techniques. After ligation, the DNA library strands should be circular. If double stranded portion of the library contains nicks or abasic sites the polymerase will stop sequencing at that point. An exonuclease digestion is used to ensure all molecules are fully circular; the enzyme degrades nicked sample DNA, unligated sample DNA and unincorporated adapters.
The circular DNA library is then quantified on a fluorometer using a sensitive nucleic acid detection assay (SYBR-Green); this measurement is used to dilute the DNA to the correct molarity for sequencing. The completed libraries are annealed to the sequencing primer, and then incubated with the sequencing polymerase to allow binding to the template. The buffer does not contain key reaction components, so the polymerase does not synthesize new template until later in the process.
The mass of input sample DNA needed varies with library size, as the sequencer performs most efficiently at a high library molarity (nano to pico molar range). DNA quality is a major factor, as the exonuclease step will degrade DNA which has been nicked by exposure to harsh chemicals, heat, acid, or trace amounts of nucleases. In addition, some chemical treatments or exposure to UV may oxidize, cross-link or otherwise damage the DNA leading to poor sequencing. Because there is no amplification of the DNA in the method, we advise 5 micrograms of starting DNA material.
Imaging Single Base Incorporation with the Pacific Biosciences RS
To visualize a single molecule of DNA while it was being synthesized, a large step forward was needed in optics, fluidics and software. Several major challenges needed to be solved before a commercial-quality instrument would be feasible.
Polymerases travel along the template strand as they synthesize the complement strand; this is no surprise as the polymerase is a tiny fraction of the size of the genome. Thus, the first challenge was to confine the polymerase to a defined region so it could be imaged as it incorporated fluorescently dyed nucleotides. In addition, the imaging would have to occur in a soup of unincorporated fluorescent nucleotides that are a source of high background signal. 188The PacBio solution was to immobilize the polymerase in a tiny hole on a glass surface using a biological bond. Small glass squares are coated with a metallic film and 150,000 tiny (<100 nanometer) diameter holes are etched in an array. Each well forms a tiny reaction chamber wide enough to fit a single polymerase molecule. The polymerase was modified to contain a stepavidin protein, this binds strongly to a small molecule called biotin. The glass in the reaction chamber was coated with biotin; once a polymerase entered the well by diffusion, it is captured and anchored close to the glass interface.
PacBio Single Molecule Real Time (SMRT) Sequencing Cells
In addition to excluding multiple polymerases, such small holes have a unique property wherein the wavelengths of light that can traverse the hole are restricted. This property can be observed in the door of a microwave oven during cooking. The door is composed of small holes that allow narrow wavelength visible light to pass through yet block the wider wavelength microwaves. The technology is called “zero mode waveguides” and is the basis for the PacBio imaging system. Laser light at ~600 nanometer cannot fully pass through the <100 nanometer size holes. The light exponentially decays at the glass-reaction chamber interface; the illuminated area is ~30 nanometers into the chamber. This small region is sufficient to illuminate the polymerase and fluorescent nucleotides as they were incorporated into a complementary DNA strand. However, fluorescent nucleotides in the bulk buffer would not be visible, solving the signal to noise problem.
Sequencing schematic on SMRT cell
The second major challenge was to design fluorescent bases that would not inhibit the polymerase function and fidelity. Fluorescent dyes are often bound to the base ring itself, this presents major two problems for single molecule sequencing: the fluorescence of newly incorporated bases will be masked by earlier bases as the DNA chain grows and the polymerase will eventually stall due to the irregular molecular structure of the growing strand. Instead of labeling the nucleotide on the base, PacBio labeled nucleotides on the phosphate. When the polymerase incorporates each new base, the phospholinked dye is cleaved off as part of the phosphate bond formation and diffuses away. Thus, the newly synthesized strand retains no modification of the labeling.
During the sequencing reaction, the PacBio RS instrument records a high quality movie in all four base color channels. Base calling is performed by observing incorporation events during complement strand synthesis. Labeled nucleotides present in the bulk solution may diffuse rapidly in and out of the illuminated area, but a nucleotide that is incorporated into the new strand will be present for milliseconds while the polymerase is catalyzing the new phosphodiester bond. This time difference provides sufficient signal to noise to enable accurate base-calling.
Base-calling Accuracy is a Function of Read Length and Library Insert Size
The use of circular libraries allows an error correction during sequencing. The highly processive polymerase used can synthesize complementary strands much longer than the insert length of the library. Instead of using a linear DNA library, circular libraries allow the polymerase to sequence the same strand multiple times. This technique is termed “rolling circle” amplification. For example, a polymerase that reads 3 kiobases of sequence of a 1 kilobase template molecule will pass through the positive strand once, the negative strand once and the positive strand a second time. These three passes through the same sequence can be aligned by the base-calling software to produce a circular consensus sequence. The single read accuracy of PacBio sequencing is roughly 80%. A 250bp insert sequenced at 3 kilobase will have 12 passes through the template, while a 1 kilobase insert will have 3 passes. Use of circular consensus improves accuracy substantially; this accuracy is a function of the ratio of insert size to total sequencing read length.
Using Kinetic Information to Detect DNA Modification
A key application of this single molecule technology is observation of polymerase kinetics during synthesis. DNA modifications such as cytosine methylation are well known as epigenetic influences on gene transcription; however other modifications are known but cannot be readily assessed at the genome-wide scale. When the polymerase encounters a modified base in the template, the polymerase may have to adjust to accommodate the base; this could be reflected in pauses or other variations during the sequencing. Using templates with known modifications, statistical models of incorporation can be built to identify previously uncharacterized DNA modifications.
The Pacific Biosciences RS can be used for:
- Whole genome sequencing of organisms with small genomes
- Full-length sequencing of >1kb PCR products
- Detection of epigenetic DNA modifications by observing enzyme kinetics
- Rapid sequencing of a bacterial pathogen during an outbreak
- Transcript splice mapping
- Sequencing of repetitive, high GC or otherwise difficult templates
- SNP phasing
- De novo assembly of genomes and transcriptomes using long reads up to 6 kb