The Methods of Whole Genome Sequencing

Overview of Whole Genome Sequencing

The genome of each individual organism contains its entire genetic information. Whole genome sequencing technology can comprehensively and accurately analyze entire genomes, thereby breaking the information contained in it and revealing the complexity and diversity of the genome. The emergence of whole genome sequencing technology is a revolutionary advancement in all areas of life sciences. Whole genome sequencing can detect variants, including single-nucleotide variants, insertions/deletions, copy number changes, and large scale structural variants. Whole genome sequencing can be classified into de novo and resequencing depending on whether there is a reference genome. If there is a reference genome, genome assembly will become more easy and rapid.

  1. Two Classic Approaches for Sequencing Large Genomes

In the early 80s, Sanger successfully completed a whole genome sequencing of the lambda phage by using the shotgun method, and the method was successfully applied to the larger virus DNA, the organelle DNA, and the sequencing of the bacterial genome DNA. Shotgun sequencing is the classic strategy for whole genome sequencing. The shotgun sequencing strategy provides a technical guarantee for large-scale sequencing. The technology first randomly interrupts a complete target sequence into small fragments, sequenced separately, and then splicing them into a consistent sequence by using the overlapping relationships of these small fragments. It mainly includes two methods: one is hierarchical shotgun sequencing (clone-by-clone method) and the other is whole genome shotgun sequencing.

  • Clone-by-clone sequencing

This method was once adopted by the HGP consortium. This method can generate high density maps, making the genome assembly easier. It generally includes four steps, preparation of BAC clone library, preparation of clone fingerprint, BAC clone sequencing, and sequence assembly. However, this method is time-consuming and costly, so it is seldom used at present.

Figure 1. Steps involved in the clone-by-clone sequencing.

WGS generally involves six steps, isolation of genomic DNA, random fragmentation of genomic DNA, size selection using electrophoresis, library construction, paired-end sequencing (PE sequencing), and genome assembly. Two different sizes of DNA fragments including longer insert (2-2.5 kb) and short insert (0.5-1.2 kb) are selected from the agarose gel. While the long inserts are cloned in phage or socmid vectors, the short inserts are cloned in plasmid vectors. The short insert clone library is used for sequencing from both the ends. Since large numbers of clones are sequenced, each of the genomes will be covered more than 10 times. Long insert clones can be used to increase the efficiency of genome assembly.


  • Does not require genome maps.
  • Less time consuming
  • Money-saved


  • Genome assembly for eukaryotic genomes is difficult due to abundant repetitive sequences
  • Genome sequencing using this method is not accurate.
  1. NGS Accelerates WGS

Unlike clone-based library approaches, next-generation sequencing platforms utilize a dramatically simplified method of library construction, which has simplified and accelerated the whole genome shotgun sequencing. In generally, genomic DNA is first randomly fragmented using sonication or nebulization, and then are ligated to a platform-specific set of double-stranded adapters to generate a shotgun library. Subsequently, these library fragments can be amplified in situ by hybridization and extension from complementary adapters which are covalently attached to the surface of a glass microfluidic cell or a small bead (depending on the sequencing platform). All NGS instruments utilize a microfluidic device to contain the amplified fragments of the shotgun library, followed by an imaging step that collects data from fragments being actively sequenced.

We will take the Illumina sequencer as an example to illustrate the workflow of WGS based on high-throughput sequencing.

  • Construction of Sequencing Library

The genome is first prepared, and then the DNA is randomly fragmented into hundreds of bases or shorter fragments with specific adapters at both ends. If the transcriptional group is sequenced, the library construction is a bit more troublesome. After the RNA fragmentation, it needs to reverse to cDNA, then add the connector, or reverse the RNA to the cDNA first, then fragment and add the joint. The size of the fragment (insert size) has an impact on the subsequent data analysis and can be selected according to needs. For genome sequencing, several different insert sizes are usually chosen to get more information when assembling.

  • Surface Attachment and Bridge Amplification

The reaction of Solexa sequencing is carried out in a glass tube called flow cell, and flow cell is subdivided into 8 Lanes, each of which has a number of fixed single strand joints on the inner surface of each Lane. The DNA fragment of the joint was transformed into a single strand and combined with the primers on the sequencing channel to form a bridge like structure for subsequent preamplification.

  • Denaturation and Complete Amplification

The unlabeled dNTP and the common Taq enzyme were added for solid phase bridge PCR amplification, and the single-stranded bridge sample was amplified into a double-stranded bridge fragment. By denaturation, a complementary single strand is released and anchored to the nearby solid surface. By continuously cycling, millions of clusters of double-stranded analytes will be obtained on the solid surface of the Flow cell.

  • Single Base Extension and Sequencing

Four fluorescently labeled dNTPs, DNA polymerases, and linker primers were added to the sequenced flow cells for amplification. When each sequencing cluster extends the complementary strand, each fluorescent labelled dNTP is added to release the corresponding fluorescence. The sequencer obtains sequence information of the fragment to be tested by capturing a fluorescent signal and converting the optical signal into a sequencing peak by computer software. The read length is affected by a number of factors that cause signal attenuation, such as incomplete cutting of fluorescent markers. As the length of the reading increases, the error rate will also increase.

  • Data Analysis

This step is not strictly a part of the sequencing process, but it only makes sense through the work in front of this step. The raw data obtained by sequencing is a sequence of only a few tens of bases in length, and the contigs that assemble these short sequences through bioinformatics tools are even the framework of the entire genome. Alternatively, these sequences are aligned to an existing genome or a similar species genome sequence, and further analyzed to obtain biologically meaningful results.

  1. Application of Third-generation Sequencing Sequencing in Whole Genome Sequencing

Although next-generation sequencing has enabled population-scale analyses of small variants, it’s difficult to identify larger structural variations. Further, de novo assembly using next-generation sequencing are often of lower quality compared with those using older and more expensive methods. The single-molecule sequencing technologies can get over these difficulties, which can span nearly entire chromosome arms and are not sensitive to GC content. Third-generation sequencing technologies have been used to produce highly accurate de novo and reference assemblies for microorganisms, plants, animals, and humans, enabling new insights into evolution and sequence diversity.

If you are interested in our genomics services, please feel free to contact our scientists.


  1. Bentley D R. Whole-genome re-sequencing. Current Opinion in Genetics & Development, 2006, 16(6):545-552.
  2. Fuentespardo A P, Ruzzante D E. Whole-genome sequencing approaches for conservation biology: advantages, limitations, and practical recommendations. Molecular Ecology, 2017, 26(20):5369.
  3. Batzoglou S, Berger B, Mesirov J, et al. Sequencing a genome by walking with clone-end sequences (abstract):a mathematical analysis// International Conference on Computational Molecular Biology. DBLP, 2000:45.
  4. Sanger F ,, Coulson A R, Hong G F, et al. Nucleotide sequence of bacteriophage lambda DNA. Journal of Molecular Biology, 1982, 162(4):729-73.
  5. Kawarabayasi Y, Sawada M, Horikawa H, et al. Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3. Dna Research, 1998, 5(2):55.
  6. Kaneko T, Sato S, Kotani H, et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. Dna Research, 1996, 3(3):185-209.
  7. Myers E W, Sutton G G, Delcher A L, et al. A Whole-Genome Assembly of. Science, 2014.
  8. Siegel A F, Engh G V D, Hood L, et al. Modeling the Feasibility of Whole Genome Shotgun Sequencing Using a Pairwise End Strategy. Genomics, 2000, 68(3):237.
  9. White O, Fraser C M. Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1. Science, 1999, 286(5444):1571-1577.
  10. May B J, Zhang Q, Li L L, et al. Complete genomic sequence of Pasteurella multocida, Pm70. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(6):3460-3465.
  11. Ginsburg G S, Willard H F. Genomic and personalized medicine. Academic Press, 2008.