Skip to main content

Pacific Biosciences Sequencing Terminology

  • SMRT® Cell (spelling: smart cell): Consumable substrates comprising arrays of zero-mode waveguide nanostructures.
  • Adapters: Exogenous nucleic acids that are ligated to a nucleic acid molecule to be sequenced. For example, SMRTbell™ adapters are hairpin loops that are ligated to both ends of the double stranded DNA insert to produce a SMRTbell™ sequencing template. When adapter sequences are removed from a CCS read, the read is split into multiple subreads.
  • Movie: Real-time observation of a SMRT® Cell.
  • zero-mode waveguide (ZMW): A nanophotonic device for confining light to a small observation volume. This can be, for example, a small hole in a conductive layer whose diameter is too small to permit the propagation of light in the wavelength range used for detection. Physically part of a SMRT® Cell.
  • Sequencing ZMW: A ZMW (zero-mode waveguide) that is expected to be able to produce a sequence if it is populated with a polymerase. ZMWs used for automated SMRT Cell alignment are not considered sequencing ZMWs.
  • Run: Specifies
  • The wells and SMRT Cells to include in the sequencing run.
  • The collection and analysis protocols to use for the selected wells and cells.

Read Terminology#

  • Polymerase read (formerly called “read”): A sequence of nucleotides incorporated by the DNA polymerase while reading a template, such as a circular SMRTbell™ template. Polymerase reads are most useful for quality control of the instrument run. Polymerase read metrics primarily reflect movie length and other run parameters rather than insert size distribution. Polymerase reads are trimmed to include only the high quality region; they include sequences from adapters; and can further include sequence from multiple passes around a circular template.
  • Subread: Each polymerase read is partitioned to form one or more subreads, which contain sequence from a single pass of a polymerase on a single strand of an insert within a SMRTbell™ template and no adapter sequences. The subreads contain the full set of quality values and kinetic measurements. Subreads are useful for applications like de novo assembly, resequencing, base modification analysis, and so on.
  • Circular consensus (CCS) read: The consensus sequence determined using subreads taken from a single ZMW. This is not aligned against a reference sequence. In contrast to Reads of Insert, CCS reads require at least two full-pass subreads from the insert.
  • Read of insert: Represents the highest quality single sequence for an insert, regardless of the number of passes. For example, if your template received one-and-a-half subreads, that information will be combined into a Read of Insert. CCS is an example of a special case where at least two full subreads are collected for an insert. Reads of Insert give the most accurate estimate of the length of the insert sequence loaded onto a SMRT® Cell. For long templates, Reads of Insert may be the same as Polymerase Reads.

Read Length Terminology#

  • Mapped polymerase read length: The total number of bases along a read from the first adapter or aligned subread to the last adapter or aligned subread. Approximates the sequence produced by a polymerase in a ZMW.
  • Mapped subread length: The length of the subread alignment to a target reference sequence. This does not include the adapter sequence.
  • Polymerase read length: The total number of bases produced from a ZMW after trimming. This may include the adapter sequence.

Primary Analysis Terminology#

  • Primary analysis protocol: Specifies signal processing of the movie, base calling of the traces/pulses, and quality assessment of the base calls. Primary analysis is always performed on the instrument.
  • Adapter Screening: Annotates adapter read locations. Used to break a read into subreads during secondary analysis mapping and Circular Consensus.
  • High Quality Region Screening: Annotates the high quality sequencing regions of a read to be used during Raw Read Trimming.
  • Insert Screening: Annotates insert DNA regions in the Polymerase Read.
  • Quality Value Assignment: A prediction of the error probability of a basecall.
  • Quality Value (QV): The total probability that the basecall is an insertion or substitution or is preceded by a deletion. QV = -10 * log10(p). For example, QV 20 is 99% accurate, QV 30 is 99.9% accurate, and QV 50 is 99.999% accurate.
  • Insertion QV: The probability that the basecall is an insertion with respect to the true sequence.
  • Deletion QV: The probability that a deletion error occurred before the current base.
  • Substitution QV: The probability that the basecall is a substitution.
  • Raw read trimming: Extraction of high quality regions from an unfiltered read. Trimming of an unfiltered read produces a polymerase read.
  • Read Quality Assignment: A trained prediction of a read’s mapped accuracy based on its pulse and base file characteristics (peak signal-to-noise ratio, average base QV, interpulse distance, and so on). This is used during secondary analysis filtering.

Secondary Analysis Terminology#

  • Secondary analysis protocol: Specifies how to
  • Align a group of reads to a reference sequence to produce a consensus sequence.
  • Assemble a set of reads into contigs to produce a de novo sequence.
  • Identify insertions, deletions, and SNPs.
  • Evaluate consensus quality and quality of the instrument run.
  • Consensus: Generation of a consensus sequence from multiple-sequence alignment.
  • De Novo Assembly: Assembly of all subreads without a reference sequence.
  • Filtering: Removes reads that do not meet the Read Quality and Read Length parameters set by the user. The current default filtering parameters defined by Pacific Biosciences are:
  • Read Quality ≥ .75 (as of SMRT Analysis v2.1)
  • Read Length ≥ 50 bases
  • Mapping: Local alignment of a read or subread to a reference sequence.

Accuracy Terminology#

  • Circular consensus accuracy: Accuracy based on multiple sequencing passes around a single circular template molecule.
  • Consensus accuracy: Accuracy based on aligning multiple sequencing reads or subreads together, optionally with a reference sequence.
  • Polymerase read quality: A trained prediction of a read’s mapped accuracy based on its pulse and base file characteristics (peak signal-to-noise ratio, average base QV, inter-pulse distance, and so on).
  • Subread Accuracy: The post-mapping accuracy of the basecalls.
  • Formula: [1 - (errors/subread length)], where errors = number of deletions + insertions + substitutions.

References Pac Bio