Principles and Workflow of Whole Genome Bisulfite Sequencing

Principles of whole genome bisulfite sequencing

Epigenetic studies have confirmed that DNA-methylation modification of specific gene regions plays an important role in chromosome conformation and gene expression regulation. Methylation of DNA cytosine residues at the C5 (5meC) is a common epigenetic mark in many eukaryotes and is widely found in CpG or CpHpG (H=A, T, C). There are mainly three approaches, including endonuclease digestion, affinity enrichment, and bisulfite conversion (Table 1). Almost all sequence-specific DNA methylation analysis approaches require a methylation-dependent treatment before amplification or hybridization to maintain fidelity. Various molecular biology techniques, such as next-generation sequencing (NGS), are subsequently performed to detect 5meC residues.

Table 1. Main principles of NGS-based methylation analysis.

  Enzyme digestion Affinity enrichment Sodium bisulfite
Principles Some restriction enzymes, such as HpaII and SmaI, are inhibited by 5meC in the CpG. Affinity enrichment uses antibodies specific for 5meC or methyl-binding proteins with affinity for profiling of DNA methylation. Sodium bisulfite chemically turns unmethylated cytosine into uracil, hence enabling methylation detection.
Method example Methyl-seq*MCA-seq





*MCA: methylated CpG island amplification; *HELP: HpaII tiny fragment enrichment by ligation-mediated PCR; *MSCC: methylation-sensitive cut counting; *MeDIP-seq: methylated DNA immunoprecipitation; *MIRA: methylated CpG island recovery assay; *RRBS: reduced representation bisulfite sequencing; *WGBS: whole genome bisulfite sequencing; *BSPP: bisulfite padlock probes.

Bisulfite conversion spurred a revolution in genome methylation analysis in 1990s. Since bisulfite can convert un-methylated cytosines in the genome into uracils and then replaced by thymines during PCR amplification, which can be distinguished from the cytosine originally modified by methylation by counting cytosines and thymines for each position after sequencing (Figure 1). Whole genome bisulfite sequencing (WGBS), as a research method of great significance in this field, applies a combination of bisulfite treatment and next/third generation sequencing technologies (mostly, shotgun sequencing) to study DNA methylation at genomic level.

Figure 1. Bisulfite conversion and PCR amplification prior to DNA sequencing.

Advantages of whole genome bisulfite sequencing

  • Making genome-wide methylation profiling possible at a single-base level.
  • Assessing the methylation status of almost every CpG locus, including intergenic “gene deserts”, partial methylation domains, and remote regulatory elements.
  • Revealing absolute DNA methylation levels and methylation sequence background.

Workflow of whole genome bisulfite sequencing

In short, the basic steps of whole genome bisulfite sequencing (WGBS) include DNA extraction, bisulfite conversion, library preparation, sequencing, and bioinformatics analysis. Here we use Illumina HiSeq as our example to illustrate the workflow of WGBS.

Figure 2. The workflow of whole genome bisulfite sequencing (Khanna et al. 2013).

  • DNA Extraction

Firstly, approximately 1-5 mg of tissue samples collected from humans, animals, plants or microorganisms are prepared for DNA. In general, samples for whole-genome bisulfite sequencing need to meet the following four characteristics.

  1. Eukaryotes;
  2. Hypomethylation (as shown in Figure 3, studies have shown that once the number of CpG sites in a region increases, the sequencing data of WGBS begins to decrease);

iii. Its reference genome has been assembled to the scaffold level at least;

  1. Relatively complete genome annotations. And then, apply a suitable kit to extract high-purity and high-molecular-weight DNA. The extracted DNA should have a mass of no less than 5 μg, a concentration of no less than 50 ng/ul, and an OD260/280 of 1.8 to 2.0.

Figure 3. Conventional WGBS technology has low coverage of methylation sites (Raine et al. 2016)

  • Bisulfite Conversion

Bisulfite conversion is considered to be the “gold standard” for DNA methylation analysis, the principles have been shown in Figure 4. For this method, BS-induced DNA degradation may lead to depletion of genomic regions enriched for unmethylated cytosines. Therefore, it is important to assess the amount of DNA degradation under reaction conditions, and how this affects the desired amplicon should also be considered. Olova et al. (2018) found that DNA degradation is strong in bisulfite conversion protocols that utilize high denaturation or high bisulfite molarity. There are several kits available in the market (Table 2).

Figure 4. Bisulfite-mediated deamination of cytosine (Hayatsu et al. 2004).

Table 2. Bisulfite conversion protocols and parameters.

Kits Denaturation Conversion temperature Incubation time
Zymo EZ DNA Methylation Lightning Kit Heat-based; 99 °C
Alkaline-based; 37 °C
65 °C 90 minutes
EpiTect Bisulfite kit (Qiagen) Heat-based; 99 °C 55 °C 10 hours
EZ DNA Methylation Kit (Zymo Research) Alkaline-based; 37 °C 50 °C 12-16 hours
  • Library Preparation

Take the EpiGnomeTM Methyl-Seq Kit (Epicentre) as an example (as shown in Figure 5), bisulfite-treated single-stranded DNA is random-primed using a polymerase capable of reading uracil nucleotides, to synthesize DNA containing a specific sequence tag. The 3’ end of the newly synthesized DNA strand is then selectively labeled with a second specific sequence, thus a two-marker DNA molecular with a known sequence tag at the 5’ and 3’ ends can be obtained. Illumina P7 and P5 adapters are subsequently added by PCR at the 5 and 3 ends prior to DNA sequencing.

Figure 5. Workflow for the EpiGnomeTM Methyl-Seq Kit.

  • Sequencing

Hiseq sequencing technology, a novel sequencing method based on sequencing-by-synthesis (SBS), is widely applied for WGBS. The bridge amplification on a flow cell is achieved by using a single molecule array. Since the new reversible blocking technique can synthesize only one base at a time and label the fluorophore, the corresponding laser is used to excite the fluorophore, and the excitation light can be captured to read the base information. Paired-end 150 bp strategy is typically employed in WGBS to sequence 250-300 bp insertion bisulfite-treated DNA libraries. In addition to Illumina HiSeq, PacBio SMRT, Nanopore, Roche 454, and other Illumina platforms are also commonly used for this purpose.

  • Data Analysis

A series of analyses can be performed for the sequencing results. Five main types of information analysis are listed in Table 3. In addition, methylation density analysis, differentially methylated region (DMR) analysis, DMR annotation and enrichment analysis (GO/KEGG) and clustering analysis can also be performed. The common bioinformatic resources of WGBS include BDPC, CpGcluster, CpGFinder, Epinexus, MethTools, mPod, QUMA, and TCGA Data Portal.

Table 3. Main types of WGBS data analysis.

Type Details
Alignment against reference genome Tools, such as SOAP software, are used to compare the reads with the reference genome sequence, and only the aligned reads will be used for the analysis of methylation information. Align reads allowing C-C matches and C-T mismatches.
mC calling Determine mC position throughout the genome. mC ratios are computed by considering read quality and multi-locus mapping probabilities. Discard small-probability alignment that has a low reliability of alignment.
Sequence depth and coverage analysis An image reflecting the relationship between gene coverage and sequencing depth determines whether methylation discovery can be made with a certain degree of confidence at specific base positions.
Methylation level analysis The methylation level of each methylated C base is calculated as follows: 100*reads/total reads. The genome-wide average methylation level reflects the overall characteristics of the genomic methylation profile.
Global trends of methylome The distribution ratio of CG, CHGG and CHH in methylated C bases reflects the characteristics of whole genome methylation maps of specific species to some extent.

Featured services:

Whole genome bisulfite sequencing

Targeted bisulfite sequencing


Reduced Representation Bisulfite Sequencing


  1. Fraga, M. F., Esteller, M. (2002). Dna methylation: a profile of methods and applications. Biotechniques,33(3), 636-49.
  2. Green, R. E., Krause, J., Briggs, A. W., Maricic, T., Stenzel, U., et al. (2010). A Draft Sequence of the Neandertal Genome. Science, 328(5979), 710–722.
  3. Hayatsu, H., Negishi, K., & Shiraishi, M. (2004). DNA methylation analysis: speedup of bisulfite-mediated deamination of cytosine in the genomic sequencing procedure. Proceedings of the Japan Academy,80(4), 189-194.
  4. Herman, J. G., Graff, J. R., Myöhänen, S., Nelkin, B. D., & Baylin, S. B. (1996). Methylation-specific pcr: a novel pcr assay for methylation status of cpg islands. Proceedings of the National Academy of Sciences of the United States of America,93(18), 9821-9826.
  5. Ji, L., Sasaki, T., Sun, X., Ma, P., Lewis, Z. A., & Schmitz, R. J. (2014). Methylated dna is over-represented in whole-genome bisulfite sequencing data. Front Genet,5(5), 341.
  6. Khanna, A., Czyz, A., & Syed, F. (2013). Epignome[trade] methyl-seq kit: a novel post-bisulfite conversion library prep method for methylation analysis. Nature Methods,10(10).
  7. Laird, P. W. (2003). The power and the promise of DNA methylation markers. Nature Reviews Cancer, 3(4), 253–266. doi:10.1038/nrc1045
  8. Laura-Jayne, G., Mark, Q. T., Lisa, O., Jonathan, P., Neil, H., & Anthony, H. (2015). A genome-wide survey of dna methylation in hexaploid wheat. Genome Biology,16(1), 273.
  9. Lin Liu, Ni Hu, Bo Wang, Minfeng Chen, Juan Wang, & Zhijian Tian, et al. (2011). A brief utilization report on the illumina hiseq 2000 sequencer. Mycology,2(3), 169-191.
  10. Meissner, A., Gnirke, A., Bell, G. W., Ramsahoye, B., Lander, E. S., & Jaenisch, R. (2005). Reduced representation bisulfite sequencing for comparative high-resolution dna methylation analysis. Nucleic Acids Research,33(18), 5868-77.
  11. Meyer, M., Kircher, M., Gansauge, M. T., Li, H., Racimo, F., & Mallick, S., et al. (2012). A high coverage genome sequence from an archaic denisovan individual. Science,338(6104), 222-6.
  12. Olova, N., Krueger, F., Andrews, S., Oxley, D., Berrens, R. V., & Branco, M. R., et al. (2018). Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting dna methylation data. Genome Biology,19(1), 33.
  13. Raine, A., Manlig, E., Wahlberg, P., Syvänen, A. C., & Nordlund, J. (2016). Splinted ligation adapter tagging (splat), a novel library preparation method for whole genome bisulphite sequencing. Nucleic Acids Research,45(6), e36.
  14. Ziller, M. J., Müller, F., Liao, J., Zhang, Y., Gu, H., & Bock, C.,et al. (2011). Genomic distribution and inter-sample variation of non-cpg methylation across human cell types. Plos Genetics, 7(12), e1002389.

Overview of Metatranscriptomic Sequencing: Principles, Workflow, and Applications

What is metatranscriptomic sequencing?

Metatranscriptomic sequencing provides direct access to culturable and non-culturable microbial transcriptome information by large-scale, high-throughput sequencing of transcripts from all microbial communities in specific environmental samples. Metatranscriptomic sequencing offers an opportunity to randomly sequence mRNAs as a unit for understanding the regulation of complex processes in microbial communities. The study of the metatranscriptome through Next-Generation Sequencing techniques allows us to obtain gene expression profiles from whole microscopical populations, providing new insights into poorly known biological systems and overcoming technical limitations related to individual bacteria isolation.

Challenges of metatranscriptomic sequencing

Although current metatranscriptomic techniques are promising, there are still several obstacles that limit their large-scale application. First, much of harvested RNA comes from ribosomal RNA (rRNA), and its dominating abundance can dramatically reduce the coverage of mRNA, which is the main focus of transcriptomic studies. To overcome this, some efforts have been made to effectively remove rRNA. Second, mRNA is notoriously unstable, compromising the integrity of the sample before sequencing. Third, differentiating between host and microbial RNA can be challenging, although commercial enrichment kits are available. This may also be done in silico if a reference genome is available for the host, as in the work of Perez-Losada et al. who considered the impact of host–pathogen interactions on the human airway microbiome. Finally, transcriptome reference databases are limited by their coverage.

Workflow of metatranscriptomic sequencing

To put it simply, the first step is to extract total RNA from the sample, and then to detect it. The qualified RNA is subjected to fragment screening, database construction and corresponding quality testing. The qualified library will be sequenced (mainly using Illumina sequencing platform). The raw data obtained by sequencing will be used for bioinformatics analysis.

The metatranscriptomics library preparation process is shown in figure 2. The two main strategies for mRNA enrichment are illustrated, either by using rRNA separation through means of hybridization with 16S and 23S rRNA probes, or by a depletion of rRNAs through means of a 5-exonuclease. Then, first strand of cDNA is synthesized by means of reverse transcriptase using random hexamers. And second strand of cDNA is synthesized by a DNA polymerase. Finally, sequencing adapters are attached to the cDNA strands, and this could be done either by PCR or by ligation.

Figure 2. The metatranscriptomic sequencing library preparation process (Peimbert M, et al. 2016)

The overall process of metatranscriptomic sequencing bioinformatics steps are: filtering the readings, selecting the library between aligning the reference sequence and performing de novo assembly, annotation, statistical analysis, and uploading the original, assembled, and annotated data sets.

The applications of metatranscriptomics

  • Human health

Symbiotic bacteria (normal flora) play a key role in protecting us from pathogens, but under certain conditions they can overcome protective host responses and trigger pathological effects. Microbial population analysis can be used as an indicator of an individual’s health status and as a powerful tool for the prevention, diagnosis and treatment of specific diseases.

  • Assessment of microbiome–immune interactions

The effects of microbiota on the mucosal immune system are thought to be key to affect host physiology. A study of toll-like receptor 5 (TLR5) knockout (KO) mice is an interesting example of the use of metatranscriptomics to complement metagenomic and 16S rRNA characterization of this microbial immune interaction. Metatranscriptomics analysis showed that flagellar motor-related gene expression was up-regulated in TLR5KO mice compared to wild-type mice. In this model, TLR 5 flagellin recognition causes the production of anti-flagellin antibodies, resulting in down-regulation of various bacterial flagellar motor genes, thereby inhibiting microflora. Deletion of TLR 5 results in reduced production of anti-flagellin antibodies, leading to upregulation of bacterial flagellar motor genes, thereby increasing the ability of bacteria in the gastrointestinal environment to disrupt the mucosal barrier.

  • Studying microbiome small noncoding RNAs

The bacterial transcriptome includes small non-coding RNAs (sRNAs), which are typically between 50 and 500 bp in size and are involved in gene regulation. They regulate the translation or stability of the transcript by interacting with the 5′-untranslated region (UTR) of the target mRNA sequence. They are important for their ability to regulate important processes in bacteria, such as iron metabolism, virulence and quorum sensing, and to adapt quickly to changing environments. The emergence of next-generation sequencing methods has accelerated their identification of various bacteria, such as Salmonella and Bacillus subtilis. Next-generation sequencing methods also offer an opportunity to study bacterial sRNAs at the community level. For example, metatranscriptomics analysis of bacteria from different depths of the ocean suggests that sRNAs have a potential role in niche adaptation. Metatranscriptomics of the human activity gut microflora identified a number of sRNAs, although their role in gut microflora has not yet been elucidated.

  • Drug discovery

Hundreds of drugs used today are derived from bacterial compounds. The study of metagenomes or metatranscriptomes in microbial communities offers new opportunities to explore innovative sources for drug discovery that are inaccessible today due to technical limitations in the isolation of these non-culturable microorganisms.

  • Agriculture

Microbial communities living on and around plants play a vital role in the nutrients needed for plant growth. In addition, the presence of specific micro-communities makes crops healthy and productive. Metagenomics and metatranscriptomics provide an opportunity to explore how microbial soil populations produce healthier and higher yielding crops.

  • Ecology

Microorganisms are able to remove a wide variety of natural and synthetic harmful substances and convert them into other harmless compounds in humans and the environment. I don’t know how these microbial communities degrade harmful chemicals, but it provides new solutions for repairing and monitoring environmental pollution or improving drinking water purification methods.

  • Food industry

Metagenomics and metatranscriptomics methods can be used to improve food quality, function and safety, and provide information related to metabolic activities of microbial communities.

At CD Genomics, we provide you with high-quality sequencing and integrated bioinformatics analysis for your metatranscriptomics project. If you have additional requirements or questions, please feel free to contact us.


  1. Peimbert M, Alcaraz L D. A Hitchhiker’s Guide to Metatranscriptomic sequencing [M]// Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing. Springer International Publishing, 2016.
  2. Vanessa A P, Huang W, Victoria S U, et al. Metagenomics, Metatranscriptomic sequencing, and Metabolomics Approaches for Microbiome Analysis: [J]. Evolutionary Bioinformatics Online, 2016, 12(Supple 1):5-16.
  3. Dick G. Metatranscriptomic sequencing [M]// Genomic Approaches in Earth and Environmental SciMaurice C F, Haiser H J, Turnbaugh P J. Xenobiotics shape the physiology and gene expression of the active human gut microbiome[J]. Cell, 2013, 152(1-2):39-50.
  4. Warnecke F, Hess M. A perspective: Metatranscriptomics as a tool for the discovery of novel biocatalysts[J]. Journal of Biotechnology, 2009, 142(1):91-95.
  5. Jorth P, Turner K H, Gumus P, et al. Metatranscriptomics of the Human Oral Microbiome during Health and Disease[J]. Mbio, 2014, 5(2): e01012.
  6. O’Malley M A. Metatranscriptomics[M]. Springer New York, 2013.
  7. Cao Y, Fanning S, Proos S, et al. A Review on the Applications of Next Generation Sequencing Technologies as Applied to Food-Related Microbiome Studies: [J]. Frontiers in Microbiology, 2017, 8:1829.
  8. Bashiardes S, Zilberman-Schapira G, Elinav E. Use of Metatranscriptomics in Microbiome Research[J]. Bioinformatics & Biology Insights, 2016, 10(10):19-25.

Read the full article here:

Bioinformatics Workflow of Whole Exome Sequencing

The advent of next-generation sequencing (NGS) has greatly accelerated genomics research, which produces millions to billions of sequence reads at a high speed. Currently, available NGS platforms include Illumina, Ion Torrent/Life Technologies, 454/Roche, Pacific Bioscience, Nanopore, and GenapSys. They can produce reads of 100-10,000 bp in length, enabling sufficient coverage of the genome at a lower cost. But faced with the enormous amount of sequence data, how do we best deal with them? And what are the most appropriate computational methods and analysis tools for this purpose? In this review, we focus on the bioinformatics pipeline of whole exome sequencing (WES).

Whole exome sequencing is a genomic technique for sequencing the exome (all protein-coding genes). It is widely used in basic and applied research, especially in the study of Mendelian diseases. You can read the article principle and workflow of whole exome sequencing to know more about WES. A typical workflow of WES analysis includes these steps: raw data quality control, preprocessing, sequence alignment, post-alignment processing, variant calling, variant annotation, and variant filtration and prioritization. They will be discussed below.

Figure 1. A general framework of WES data analysis (Bao et al. 2014).

Raw data quality control

Sequence data generally have two common standard formats: FASTQ and FASTA. FASTQ files can store Phred-scaled base quality scores to better measure sequence quality. It is, therefore, widely accepted as the standard format for NGS raw data. There are multiple tools developed to assess the quality of NGS raw data, such as FastQC, FastQ Screen, FASTX-Toolkit, and NGS QC Toolkit.

Read QC parameters:

  1. Base quality score distribution
  2. Sequence quality score distribution
  3. Read length distribution
  4. GC content distribution
  5. Sequence duplication level
  6. PCR amplification issue
  7. Biasing of k-mers
  8. Over-represented sequences

Data preprocessing

With a comprehensive read QC report (generally involves the above parameters), researches can determine whether data preprocessing is necessary. Preprocessing steps generally involve 3’ end adapter removal, low-quality or redundant read filtering, and undesired sequence trimming. Several tools can be used for data preprocessing, such as Cutadapt and Trimmomatic. PRINSEQ and QC3 can achieve both quality control and preprocessing.

Sequence alignment

There are algorithms for shot reads mapping, including Burrows-Wheeler Transformation (BWT) and Smith-Waterman (SW) algorithms. Bowtie2 and BWA are two popular short reads alignment tools that implement BWT (Burrows-Wheeler Transformation) algorithm. MOSAIK, SHRiMP2, and Novoalign are important short reads alignment tools that are implementations of SW algorithm with increased accuracy. Additionally, multithreading and MPI implementations allow significant reduction in the runtime. Of all the tools mentioned above, Bowtie2 is outstanding by fast running time, high sensitivity, and high accuracy.

Post-alignment processing

After reads mapping, the aligned reads are post-processed so as to remove undesired reads or alignment, such as reads exceeding a defined size and PCR duplicates. Tools such as Picard MarkDuplicates and SAMtools can distinguish PCR duplicates from true DNA materials. Subsequently, the second step is to improve the quality of gapped alignment via indel realignment. Some aligners (such as Novoalign) and variant callers (such as GATK HaplotypeCaller) involve indel alignment improvement. After indel realignment, BQSR (BaseRecalibrator from the GATK suite) is recommended to improve the accuracy of base quality scores prior to variant calling.

Variant calling

The variant analysis is important to detect different types of genomic variants, such as SNPs, SNVs, indels, CNVs, and larger SVs, especially in cancer studies. It is vital to distinguish somatic from germline variants. Somatic variants present only in somatic cells and are tissue-specific, while germline variants are inherited mutations presented in the germ cells and are linked with patient’s family history. Variant calling is used to identify SNP and short indels in exome samples. The common variant calling tools are listed in Table 1. Some studies have evaluated these variant callers. Liu et al. recommended GATK, and Bao et al. recommended a combination of Novoalign and FreeBayes.

Table 1. The common variant calling tools.

Variant calling Tools
Germline variant calling GATK, SAMtools, FreeBayes, Atlas2
Somatic variant detection GATK, SAMtools mpileup, Issac variant caller, deepSNV, Strelka, MutationSeq, MutTect, QuadGT, Seurat, Shimmer, SolSNP, jointSNVMix, SomaticSniper, VarScan2, Virmid

Variant annotation

After variants are identified, they need to be annotated for better understanding disease pathogenesis. Variant annotation generally involves information about genomic coordinates, gene position, and mutation type. Many studies focus on the non-synonymous SNVs and indels in the exome, which account for 85% of known disease-causing mutations in Mendelian disorders and a great deal of mutations in complex diseases.

Besides the basic annotation, there are many databases that can provide additional information about the variants. ANNOVAR is a powerful tool that combines over 4,000 public databases for variant annotation, such as dbSNP, 1000 Genomes, and NCI-60 human tumor cell line panel exome sequencing data. This tool can be used for minor allele frequency (MAF) prediction, deleterious prediction, indication of conservation of the mutated site, experimental evidence for disease variant, and prediction scores from GERP, PolyPhen, and other programs. Other common databases include OncoMD, OMIM, SNPedia, 1000 genomes, bdSNP, and personal genome variants.

Variant filtration and prioritization

WES can generate thousands of variant candidates. The number can be reduced by variant prioritization, to generate a short but prior candidate mutation list for further experimental validation. Variant prioritization involves three steps: 1) removal of less reliable variant calls; 2) depletion of common variants (due to the assumption that rare variants are more likely to cause disease); 3) prioritization of variants relative to the disease using discovery-based and hypothesis-based approaches. The available tools for variant filtration and prioritization include VAAST2, VarSifer, KGGseq, PLINK/SEQ, SPRING, GUI tool, Gnome, and Ingenuity Variant Analysis.


In the next few years, whole exome sequencing may be adopted as a routine clinical procedure for disease treatment. And many healthcare facilities have already provided genetic testing by utilizing NGS technologies such as WES. The next challenge will be the data management with millions of genomic variants, and the integration of genomic variants, clinical records, and patient information.

If you are interested in the whole exome sequencing provided by CD Genomics, please feel free to contact us. We provide full whole exome sequencing service package, including sample standardization, exome capture, library construction, high-throughput sequencing, raw data quality control, and bioinformatics analysis. We can tailor this pipeline to your research interest.


  1. Bao R, Huang L, Andrade J, et al. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer informatics, 2014, 13: CIN. S13779.
  2. Meena N, Mathur P, Medicherla K M, et al. A Bioinformatics Pipeline for Whole Exome Sequencing: Overview of the Processing and Steps from Raw Data to Downstream Analysis. bioRxiv, 2017: 201145.
  3. Xu H, DiCarlo J, Satya RV, Peng Q, Wang Y. Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics. 2014;15:244.


Applications of RNA-Seq

What is RNA-Seq?

Regulation of gene expression is fundamental to link genotypes with phenotypes. RNAs shape complex gene expression networks which drive biological processes. An in-depth understanding of the underlying mechanisms about how to govern these complex gene expression networks is vital for the treatment of complex disease such as cancer. Hybridization-based microarrays are used to allow the simultaneous monitoring of expression levels of annotated genes in cell populations. However, genome-wide approaches are proved to provide more valuable insights into transcriptomes. These next/third sequencing platforms allow the rapid and cost-effective generation of massive amounts of sequence data. The RNA profiling by utilizing high-throughput sequencing technologies are known as RNA-seq.

What are the applications of RNA-Seq?

Since RNA-seq is quantitative, it is useful to determine RNA expression levels. In addition to this basic function, RNA-seq can be used for differential gene expression, variants detection and allele-specific expression, small RNA profiling, characterization of alternative splicing patterns, system biology, and single-cell RNA-seq.

Figure 1. Overview of the typical RNA-seq analysis pipeline (Han et al. 2015).

  • Differential gene expression

An important application of RNA-seq is the comparison of transcriptomes across different developmental stages, treatments, or disease conditions. This analysis, also known as differential gene expression analysis, requires identification of genes along their isoforms and precise assessment of their expression levels. It is important to illustrate functional elements of the genome and uncover the biological mechanisms of development and disease.

The common tools for differential gene expression include Cuffdiff, DESeq, DESeq2, EdgeR, PoissonSeq, Limma voom, and MISO.

  • Variants detection and allele-specific expression

RNA-seq allows identification of variants and allele-specific expression. Single-nucleotide polymorphisms (SNPs) refer to the variation in a single nucleotide that occurs at a specific position in the genome, which may lead to allele-specific expression (ASE). ASE means that one of two alleles is highly transcribed into mRNA and the other is lowly transcribed or even not transcribed at all. Recent studies have also associated ASE to the susceptibility of a number of human diseases. RNA-seq and whole-genome DNA sequencing (WGS) allow identification of common disease variants, including SNPs and ASE.

The common tools used for variants detection are GATK, ANNOVAR, SNPiR, SNiPlay3.

  • Small RNA profiling

Small RNA species generally involve microRNA (miRNA), small interfering RNA (siRNA), and piwi-interacting RNA (piRNA), as well as other types of small RNA, such as small nucleolar RNA (snoRNA) and small nuclear RNA (snRNA). Small RNAs play a role in gene silencing and post-transcriptional regulation of gene expression. Small RNAs have been demonstrated to be involved in biological processes, including development, cell proliferation and differentiation, and apoptosis. Most initial small RNA discovery studies used pyrosequencing, and subsequently, other NGS platforms with higher throughput, which resulted in genome-wide surveys and the discovery of an increasing number of small RNA species. Common bioinformatic tools for small RNA sequencing data are shown in Table 1.

Table 1. sRNA-seq web application comparison (Rahman et al. 2018).

Features Oasis 2 omiRas mirTools 2.0 MAGI Chimira sRNAtoolbox
FASTQ compression      
miRNA modifications and edits  
Novel miRNA database        
Infection and cross-species analysis          
Non-model organism        
Differential expression
Multivariate differential expression        
Novel miRNA target prediction      
Pathway/GO analysis  
Batch job submission (API)          
Genome browser          
  • Characterization of alternative splicing patterns

Alternative splicing patterns are important to understand development and human diseases since altered splicing patterns contribute to development, cell differentiation, and human disease. RNA-seq is a powerful tool for characterization of alternative splicing patterns. Paired-end sequencing enables sequence information from both ends, thereby detecting splicing patterns without a requirement for previous knowledge of transcript annotations. PacBio SMRT sequencing allows examination of splicing patterns and transcript connectivity in an unbiased and genome-scale manner by generating full-length transcript sequences.

The common tools for characterization of alternative splicing patterns include TopHat, MapSplice, SpliceMap, SplitSeek, GEM mapper, SpliceR, SplicingCompass, GIMMPS, MATS, and rMATS.

Figure 2. RNA-seq for detection of alternative splicing events (Ozsolak and Milos 2011).

  • System biology

Creating lists of differential expression (DE) genes is not the final step of RNA-seq analysis. Further biological insight into an experimental system can be acquired by looking at the expression changes of sets of genes. This process, known as system biology, is based on the understanding that the whole is greater than the sum of the parts. Pathway analysis and co-expression network analysis are two important included parts.

Table 2. The tools for pathway analysis and co-expression network analysis using RNA-seq data.

Pathway analysis GSEA A knowledge-based approach for genome-wide expression profiling.
GSVA A non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of an expression data set.
SeqGSEA Provides methods for gene set enrichment analysis by integrating differential expression and splicing.
GAGE An evaluation of the very latest large-scale genome assembly algorithms.
SPIA Identifies the pathways most relevant to the condition
TAPPA A java-based tool for identification of phenotype-associated genetic pathways.
DEAP Identifies important regulatory patterns from differential expression data.
GSAASeqSP Can identify pathways or gene sets significantly associated with a disease or phenotype.
Co-expression network GSCA help researchers make discoveries by using massive amounts of publicly available gene expression data.
DICER Detects differentially co-expressed gene sets by using a novel probabilistic score for differential correlation.
WGCNA A powerful method to isolate co-expressed groups of genes from microarray or RNA-seq data.
  • Single-cell RNA-seq

The single-cell RNA-seq offers opportunities to dissect of the interplay between intrinsic cellular processes and extrinsic stimuli in cell fate determination. It also contributes to a better understanding of how an ‘outlier cell’ may determine the outcome of an infection. In addition, a majority of living cells cannot be cultivated in vitro, single-cell RNA-seq may discover novel species or regulatory processes of biotechnological or medical relevance. The workflow of single-cell RNA-seq generally involves the following steps: single-cell isolation, cDNA library construction, RNA-seq, and bioinformatics (Figure 2).

Figure 3. The general workflow of single-cell RNA-seq.

Applications of single-cell RNA-seq

  • Stem cell differentiation
  • Embryogenesis
  • Whole-tissue analysis
  • Single-cell RNA-seq for whole-organism studies
  • Disease biology and treatment

If you want more information about RNA-seq, please refer to the following articles:

Bioinformatics workflow of RNA-seq
The technologies and workflow of RNA-seq


  1. Ozsolak F, Milos P M. RNA sequencing: advances, challenges and opportunities. Nature reviewsgenetics, 2011, 12(2): 87.
  2. Rahman R U, Gautam A, Bethune J, et al. Oasis 2: improved online analysis of small RNA-seq data. BMC bioinformatics, 2018, 19(1): 54.
  3. Han Y, Gao S, Muegge K, et al. Advanced applications of RNA sequencing and challenges. Bioinformatics and biology insights, 2015, 9: BBI. S28991.
  4. Saliba A E, Westermann A J, Gorski S A, et al. Single-cell RNA-seq: advances and future challenges. Nucleic acids research, 2014, 42(14): 8845-8860.

Why Whole Genome Sequencing (WGS) Still Not Broadly Used for Individual


In recent years, with the further development of high-throughput sequencing technology, the cost of sequencing has continued to decrease, and whole-exome sequencing (WES) has been increasingly applied to genetic disease detection, which has improved the diagnosis rate of diseases.

The Question

However, it comes with the question: does the widely used whole-genome sequencing (WGS) currently suitable for clinical application? It is likely that whole-genome sequencing will subsume genetic testing for individual or even panels of genes, replacing individual genotyping assays with a comprehensive assessment of genetic variation.

  1. Doctors are too tired to analyze and explain so many VUS, laboratory data analysis and clinical is in disjunction. Is there any reanalysis for undiagnosed cases and re-collection of clinical phenotypes is not yet determined.
  1. The whole genome sequencing costis high, and the information that could be read out is little. It is still in the scientific research stage, and the clinical application is still early.
  1. Since clinical applications are considered, the main purpose of clinical diagnosis should consider accuracy, periodicity and cost. Scientific research must use research funding!
  1. At present, the cost of WGS is still high, and the sequencing, analysis and interpretation is too time consuming. The information useful to patients is similar to the sequencing of exons.
  1. For single-gene disease, the combination of WES aCGH/SNP-array/CMA has been able to meet most reequipments. Compared with WES, WGS does have a wider coverage, but WGS detects too many variations, such as deep variation in non-coding regions, and a large number of small fragmentsof hundred bp, kb-level deletions/repetitions. These variations are difficult to explain. WGS is not fundamentally different from WES. The most important thing at present is not to expand the genome range of detection, but to expand variants that can be accurately detected, such as repeated amplification of polynucleotides. Compared with the NGS’s WGS for clinical use, it is better to wait for the technical matureness of third generations of sequencing.
  1. Currently, at least the near future, I personally think that WGS is not suitable for clinical applications. Reason 1, cost considerations. The cost of sequencing a single WGS basically equal to the cost of the current trios’ family, but the positive rate has not increased significantly (data shows 40% of WES and 42% of WGS), and the cost of analysis has increased significantly. Reason 2, without available reference database. Even if more deep intron sites are detected, there is no way to make a pathogenic judgment. Although WGS is superior to WES in terms of detection rate of CNV and SV, low-cost detection method is an alternative.

A Comparison Study of Whole Genome Sequencing (WGS) in Clinical Setting


In recent years, with the further development of high-throughput sequencing technology, the cost of sequencing has continued to decrease, and whole-exome sequencing (WES) has been increasingly applied to genetic disease detection, which has improved the diagnosis rate of diseases. However, it comes with the question: does the widely used whole-genome sequencing (WGS) currently suitable for clinical application?

The study

On March 22nd, Genet Med. published an article online (PMID: 29565419) entitled Whole-genome sequencing offers additional but limited clinical utility compared with reanalysis of whole-exome sequencing.

There have been few previous comparisons of WGS and WES for the detection rate of genetic diseases. After screening, a total of 108 patients were enrolled in the WGS analysis. Their gene chip and WES test both showed negative results and their clinical data and previous sequencing raw data were preserved intact. After WGS test, the results showed that 10 cases (9%) of positive results, 5 cases were uncertain, and 93 cases were negative.

The authors analyzed the reasons for the positive results of 10 cases of WGS, including three aspects:

(1) The academic background of WES and WGS: Although WES also detected mutation site on the 1st, 2cd, and 3rd case, it was not reported as the pathogenic site, mainly because at the time of detection, the correlation between pathogenic gene and clinical phenotype has not been determined yet;

(2) The influence of structural variation and non-coding region variation: such as the 4th, 5th, and 6th case;

(3) Impact of sequencing platform: The 7th, 8th, 9th and 10th case belongs to this situation. The mutation sites were detected by WES on the Illumina platform.

In summary, among the 10 cases with negative WES previously, 7 cases were detected by WES reanalysis and WGS, and 3 cases were detected by WGS for structural variation and non-coding region variation.

Why Whole Genome Sequencing (WGS) Is Important for Clinical Applications?

  1. Whole genome sequencing (WGS) has broad spectrum of applications in clinical field, especially for diseases with unexplained clinical conditions, especially children with poor development and mental retardation. If Chromosomal Microarray Analysis (CMA), Next Generation Sequencing (NGS), and Whole Exome Sequencing (WES) unable to diagnose, WGS could be another option.
  2. Due to the uniformity brought by WGS, 30X coverage is generally considered to be very sufficient. Without depending on capture reagents, WGS is easier to achieve the basic unification on the wet lab, and save some cost.

For WGS price, the market completion is fierce and good for reducing cost. So, I think it is very likely that WGS will become mainstream in the near future.

Another benefit of WGS is its homogeneity of mtDNA. Theoretically it could solve the difficulty of finding large CNV and partial heterogeneity problems in mtDNA.

  1. Although WGS is not suitable for clinical application at present, it is tentative to start trials in some “pilot” units.

Compared with WES, WGS can find non-coding/intronic variants, CNV/SV, skip the need for capture, etc. The difficulty lies in the cost of interpretation and sequencing. As the cost of sequencing decreases, the superiority of WGS will become more apparent. Therefore, the application of WGS in the clinic is only a matter of time.

However, what is the best practice for WGS, is still a question for colleagues and experts to work together to study and explore.

Outsourcing Plastic Molding And Mold Making In China, Trust But Verify

Nearly every single plastic molding company in the US and Europe has or is considering sending work to China, no surprise here. The incentives are very real, as are the pressures. Not only are the financial matters pressing, but some customers actually demand a China presence.

Considering the fact that China has become the world’s second largest economy, passing Germany and Japan, the potential for growth is huge, to put it mildly.

Most people recall the very poor quality of Chinese products just a few years ago. Some products are still of very low quality and it seems that you actually get what you pay for in many cases.

On the other hand, the concept of actual built-in quality seems to be slowly sinking into the national mentality, albeit very slowly. Some areas, such as Hong Kong, have a much better tradition of adapting European quality.

When Ronald Reagan was president, he was deeply involved with the arms race with the Soviet Union. One of his favorite phrases was a translation of a Russian proverb: “Trust but verify.” This became his mantra when dealing with Mikhail Gorbachev concerning the INF treaty.

This would be a good mantra for anyone doing plastic molding in China: “Trust but verify.” It seems that the mold makers and molders, and maybe others as well, have a tendency to do what you pay for when you are present, and then cut corners when you are not present.

Without attempting to sound condescending or judgmental, this just is the case. Of course there are countless exceptions, nevertheless, it is still advisable to trust but verify.

A real-life case in point is the fact that American companies usually insist on brand name mold components in their injection molds. Nobody wants a low-grade, soft ejector pin in their mold, for example. So, most people insist on PCS, DME or Progressive ejector pins.

Oddly, after a few thousand shots, the pins bend, break, pit and flake. Yet the pin has PCS etched right into the steel, so how could this be? Simple enough, it was made in a little shop that makes one pin for every company known and just etches whatever name is required. They don’t care if the steel is not H13, just so it works for a while and they make their money.

Anyone who has traveled in developing countries knows about this sort of thing. It happens all the time with just about anything that can be copied or pirated. I once bought a Disney movie before it was in the theaters! You can buy passports, driver’s licenses, birth certificates and anything else you want.

Once you build a working relationship with a Chinese supplier you would think that you are set and don’t need to trust and verify. Wrong. If that were the case, every mold that came in would be right, made using proper techniques and have documented sizes and materials.

That just is not the case, unfortunately, but it doesn’t seem to make much difference to the accounting department in some companies. The mold is so inexpensive that you can just re-work it and still make money. Don’t ask the mold maker about this though.

Find more manufacturers & suppliers: China plastic manufacturer

Mechanical Design of Biomedical Products Using Plastics

Biomedical products typically have physical requirements that differ in some respects from other products. Those requirements usually center on the need for materials and configurations that are compatible with the human body. Not only are such products regulated by FDA requirements, but they must also be able to withstand multiple sterilization cycles involving high temperatures or the use of solvents, or both.

To design parts in the biomedical industry it is necessary to understand the properties of biomedical safe materials, and to understand the constraints on processing those materials to produce sound and economical parts. Not all injection molding factories have both the capability and experience to mold these materials. As an example, parts have been designed and molded both domestically and abroad using Lexan HP2NR and Lexan HPX4. Both of these are FDA approved biocompatibility tested (FDA USP Class VI/ISO10993) plastics.

Lexan HP2NR is clear Polycarbonate plastic. 121C autoclavable for a handful of cycles. As an example, this material is being utilized in a lens for a product used for skin care treatment. The molding resource has been able to mold this material at almost defect free levels in the past 2 years. Lexan HPX4 is a Siloxane copolymer. It performs better in autoclave at 121C (a few dozen cycles, again depends on in-mold stress, morpholine level in autoclave etc. It has a slight haze in its natural state. An example of a biomedical application of this material is a part being colored with FDA approved dye to a gray Pantone 430C color when molded on an oral device used by sleep apnea patients. After molding, the parts go through a thermal press process that creates 300+ features necessary for the retention of the epoxy applied by the user. Parts are thoroughly cleaned in isopropyl alcohol solution, heat dried then bagged and boxed for shipment.

In addition to understanding the issues relating to the materials employed in designing and producing biomedical products it is also necessary to have a good grasp on ergonomic principles and the ability to apply those principles in design. Ergonomics is defined as the study of designing equipment and devices that fit the human body, its movements, and its cognitive abilities. It is always good to consider ergonomics in product design, but in the biomedical arena it is usually critical to the success of the product.

In summary, a successful biomedical product development should be characterized by carefully considered selection of materials and the capability to properly process those materials. Additionally, biomedical product development should also consider a strong dedication to ergonomic principles.

China-plasticmolding cooperates with dozens of Injection Molding Factories, we are a professional Injection Molding Company in China, offers custom injection molding service since 2003.

A Review of Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life


Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life was first published on Cell Host &Microbe in 2015. Authors include Fredrik Bckhed and Jovanna Dahlgren.

Experiment Design

Sample: intestinal microbes of 98 mothers and newborn babies (mostly Swedish)

Sequencing strategy: using metagenomic sequencing, a total of 1.52Tb of data, an average of 3.99Gb/sample

Analysis Procedures

  1. Based on the metagenomic data, the gene catalog was established at each time point by de novo assembly, and the KEGG database was used to generate the gene functional annotation.
  2. According to the abundance of different samples, contigs were assembled by binning, and 4356 genomes (>0.9Mb) were obtained by co-assembly. These assembled genomes are supplemented by 1147 genomes in NCBI.
  3. All genomes were subsequently clustered to obtain 690 unique metagenomic OTUs (MetaOTUs), which was equivalent to the classification of species.

Analysis Content

The Phylum Firmicutes and Bacteroides were the most abundant among all detected microorganisms, followed by actinomycetes and proteobacteria. According to the metagenomic data species annotations, a total of 373 MetaOTUs were annotated to the species, and the remaining 317 represented new species that were associated with known species. Most of the MetaOTUs obtained from newborns are also found in mothers, and the abundance is gradually increasing. As revealed by Figure 1, the red area is Novel MetaOTUs, the outer circle is the species annotated to the door level, the inner circle is the species that is gazing to the genus level, and the middle circle represents the abundance of each MetaOTUs of different samples.

Figure 1. MetaOTUs phylogenetic tree

By using unweighted UniFrac distance PCoA analysis of all samples, the samples were clustered according to age. The 12-month neonatal situation was most similar to that of the mother, because the neonatal intestinal microflora structure had stabilized.

With age growing, the alpha diversity in the neonatal intestinal flora gradually increased, while the beta diversity gradually decreased, indicating that the microbial species in the community became more complex, and the differences between communities became smaller.

Next, the authors performed a comparison of the gut microbiota structure of neonates with C-section and vaginally born. The result turned out to be consistent with the PCoA results. As the age increases, the bacterial composition tends to approach mothers. However, due to the absence of maternal birth canal, the number of maternal microorganisms obtained at the time of birth is small. Compared with the vaginally newborn, their establishment of microorganisms in the intestine is slow and some of the flora is missing.

Figure 2. A comparison of the gut microbiota structure of neonates with C-section and vaginally born

The metagenomic analysis also reveals the energy utilization of the neonatal intestinal flora over time. The function of the fecal flora in the first year of delivery is improved, and the phosphotransferase system (PTS) gene related to carbohydrate absorption is rich in the neonatal intestinal flora.

The gut flora of neonatal and 4-month-old neonatal is enriched with the gene that digests the sugar in the breast milk, at which point the sugar is the main source of energy. The β-glucose-specific transporter is the most abundant in newborns at 4 months and 12 months of age. The intestinal flora of 12-month-old newborns is enriched with genes that break down polysaccharides and starch and is associated with an increase in Bacteroides variabilis, which has all the enzymes involved in polysaccharide digestion.

Figure 3. KO pathway

Bacteria in the gut of virginally newborns include: Enterococcus, Escherichia/Shigella, Streptococcus, and Rothia Geory and Brown, indicating a relatively oxygen-rich intestinal environment. The 4-month neonatal gut flora is characterized by Bifidobacterium, Lactobacillus, Collins, Granulicatella, and Vesococcus, indicating a gradual decrease in intestinal oxygen concentration and an increase in the ability to produce and utilize lactic acid. The diet at this time is mainly breast milk.

The characteristics of the 12-month neonatal gut flora include: bacteria found in newborns and in 4-month old newborns (as previously listed), and only present in 12 months Bacteria, such as the genus Eichhornia.

Figure 4. Characteristics of intestinal flora in different periods of caesarean section


As an important research tool, metagenomics can get a lot of high-value information in the process of microbial population research. It is of great significance for further research on microbial-related metabolism and immunity.

Features of CD Genomics Metagenomic Sequencing

  1. Rich experience in sample processing

Such as soil, sediment, intestinal contents, manure, water, air, dairy products…CD Genomics has rich experience in various sample extraction;

  1. High quality data

CD Genomics has a wide range of technical platforms to obtain high quality data;

  1. Satisfactory analysis report

More database annotations for more analysis results

  1. Deep data mining capacity and comprehensive follow-up customer services

CD Genomics has professional bioinformatics analysis team, powerful experimental and sequencing platform to provide microbial genome de novo resequencing16S/18S/ITS, metagenomics, transcriptome sequencing and other micro-site one-stop sequencing analysis services.

Handbook of 16S rDNA Sequencing: The Past and the Present

The basic concept of 16S rDNA

16S rDNA is one of most useful and most commonly used molecular clocks in the systematic classification of bacteria. It has few species but large content (about 80% of bacterial RNA content). Its molecular size is moderate and exists in all organisms. Its evolution has been smooth and is highly conservative in structure and function. It is known as “bacterial fossil”. In most prokaryotes, rDNA has multiple copies, and the copy number of 5S, 16S, and 23S rDNA is the same. 16S rDNA is moderately sized, about 1.5Kb, which can reflect the differences between various strains, and can be easily obtained by sequencing technology, so it is widely accepted by bacteriologists and taxonomists. In short, 16S rDNA is universal, conservative, moderately sized and has variable zone.

To be more specific, this article summarizes its features as follows:

1. 16S rRNA is ubiquitous in prokaryotes. rRNA is involved in the process of protein synthesis. Its function is essential to any organism, and it remains unchanged during the long course of biological evolution. It can be seen as a time clock for biological evolution.

2. In 16S rRNA molecule, it contains both highly conserved sequence regions and moderately conserved and highly variable sequence regions, so it is suitable for the study of various biological phylogenetic relationships with different evolutionary distances.

3. The relative molecular weight of 16S rRNA is moderate, about 1540 nucleotides, which is convenient for sequence analysis.

4. The variable region sequence varies from bacteria to bacteria, and the constant region sequence is basically conserved. Therefore, primers can be designed by using the constant region sequence to amplify the 16S rDNA fragment, and the difference between the variable region sequences can be used for different genus and strains. Based on this, the bacteria were classified and identified.

16S structure

The 16S rRNA gene sequence includes 9 variable regions and 10 conserved regions. The conserved region sequence reflects the genetic relationship between species, while the variable region sequence reflects the differences between species.

Figure 1. 16S rRNA gene sequence

Strain identification based on 16S full-length (first generation sequencing)

Object: pure colonies that have been cultivated

Technology: first generation sequencer 3730

Process: Nucleic Acid Extraction –> Gene Amplification –> Product Purification –> Sequencing Reaction –> Sequence Alignment


Graph LR


Nucleic Acid Extraction–>Gene Amplification


Gene amplification–>product purification


Product purification–>sequencing reaction


Sequencing reaction–>sequence alignment

Commonly used primer sequence by 16S full length (see Table 1):

Table 1. Commonly used primer sequence by 16S full length

Reagent cost: about $15

Advantages: it can assist routine strain identification methods, such as microscopic morphology and culture characteristics as well as physical and chemical properties, including nutrient type, carbon and nitrogen source utilization capacity, various metabolic reactions, enzyme reactions and serological reactions, etc., to improve the accuracy of strain identification.

Disadvantages: it can only be used for pure bacteria!

Bacterial structure analysis based on 16S (Next-generation sequencing)

Objects: clinical samples (such as feces, cerebrospinal fluid, blood, urine, etc.), environmental samples (soil, sewage, etc.)

Technology: second-generation sequencers, such as Hiseq and Miseq from Illumina, Ion Torrent from Thermo, and 454 from Roche (discontinued)

Process: Genomic DNA –> Sample Quality Control –> PCR Amplification Database –> Library Quality Control –> Illumina Hiseq2500/Miseq Sequencing –> Raw Data –> Data Quality Control –> High Quality Data –> Bioinformatics Analysis

Some commonly used primer sequences are listed in Table 2.

Table 2. Primer selection table for specific 16S rRNA gene region to be amplified

Reagent cost: about $15 ~ $60/sample, determined by the use of consumable grade and labor costs.

Advantages: By detecting the sequence variation and abundance of 16S rDNA, the classification and abundance of bacteria is revealed in the sample, obtaining sample species classification, species abundance, population structure, phylogenetic evolution, community comparison, etc., which can be used for detection of unknown clinical samples and finding pathogens.


(1) Limited by the read length of the second-generation sequencing, currently only two of the nine variable regions of 16S can be measured, generally the V3-V4 region. Therefore, for the resolution of the flora, some strains can only be distinguished to the genus level.

(2) Lack of SOP experimental program. Different experimental factors have a greater impact on the experimental results.

(3) The 16S metagenomics can also be used for functional studies, but not accurate, compared to the WGS metagenomic sequencing.

The Future of 16S: Third Generation Sequencing

Pacbio sequencing technology for 16S metagenomics has been published. A reference article: High-resolution phylogenetic microbial community profiling.

9 variable areas are tested on the machine, with high resolution and high accuracy, which is more suitable for unknown pathogen detection and other scientific research applications in clinical samples.

Unfortunately, due to unresolved sample pooling and other reasons, its price remains high.

About author:

As a leading provider of NGS services and a partner of Illumina, CD Genomics offers a portfolio of solutions for metagenomics sequencing. 16S/18S/ITS amplicon sequencing is characterized by cost-efficiency, high-speed and practicability to help you identify and investigate the microbial community. With over 10 years of experience, we can totally meet your project requirements and budgets in the exploration of microbial biodiversity.