Next Generation Sequencing

Next generation sequencing quality control

High-throughput sequencing (HTS) is a powerful discovery to screen for non-specific germ line variants, somatic mutations and structural variants, profile copy number variations in cancers [1], assemble genomes of microbial organisms [2][3], quantify gene expression [4], identify cell populations from single-cell transcriptomes in a variety of tissues [5], and track epigenetic changes in developing organisms and diseases [6], among numerous other applications. Some of the most popular sequencing paradigms in DNA sequencing are whole-genome sequencing, exome sequencing and target panel sequencing [7]. The sequencing technology protocols has been constantly upgrading and as the cost decreasing, allowing longer reads [8] and sequencing data is rising in abundance. However, regardless of these benefits, HTS is subjected to random errors and systematic biases which consists of PCR amplification problems, GC-content shift and read contamination [9]. Quality control for DNA sequencing data has three stages: raw data, alignment and variant calling. Quality control is used in the raw data stage to quickly screen out data with major quality issues and identify data with doubtful quality [7]. At the alignment stage, quality control is focused on alignment quality, which is critical for successful variation detection. Variant calling quality control is the final opportunity to discover samples with quality issues that were not recognized earlier in the process and to reduce false-positive variants [7]. Many bioinformatics tools have been created in this regard to perform quality control (QC) on HTS data by examining raw reads and their derivatives in the form of sequencing alignments and other quantitative data. Multiple NGS QC software tools have been accessible, including FastQC, Qualimap 2, RNA-seq QC and RSeQC, NGS QC ToolKit, MultiQC, etc.

  • FastQC is a HTS quality control tool and is developed in Java. There are numerous approaches to summarise FASTQ files containing nucleotide sequence data and related quality scores. FastQC is a popular FASTQ QC tool because it summarises read quality by position, tells users about adapter content in sequences, reports on tetramer frequencies, and many other features that one would anticipate from raw sequence data [10]. The results are presented in HTML reports. It takes input in formats- FastQ, BAM, SAM.
  • Qualimap is a user-friendly platform tool with graphical and command-line interfaces. BAM QC, Counts QC, RNA-seq QC, and Multi-sample BAM QC are the four analytic methods exist [11].
  • RSeQC examines sequence quality, sequencing depth, strand specificity, GC bias, read distribution along the genomic structure, and coverage uniformity in RNA-Seq experiments. SAM, BAM, FASTA, BED, or chromosome size files can be used as input (two-column, plain text file). Genome browsers such as UCSC, IGB, and IGV can be used to visualise data. R scripts, on the other hand, can be used to visualise data [12].

 

References:

  1. Alkan C, Kidd JM, Marques-Bonet T, et al. : Personalized copy number and segmental duplication maps using next-generation sequencing.Nature Genetics. 2009;41(10):1061–1068. 10.1038/ng.437
  2. Loman NJ, Quick J, Simpson JT: A complete bacterial genome assembled de novousing only nanopore sequencing data. Nature Methods. 2015;12(8):733–738. 10.1038/nmeth.3444
  3. Masella AP, Bartram AK, Truszkowski JM, et al. : PANDAseq: paired-end assembler for illumina sequences.BMC Bioinformatics. 2012;13(1):31. 10.1186/1471-2105-13-31
  4. Ozsolak F, Milos PM: RNA sequencing: advances, challenges and opportunities.Nature Reviews Genetics. 2011;12(2):87–98. 10.1038/nrg2934
  5. Han X, Wang R, Zhou Y, et al. : Mapping the mouse cell atlas by Microwell-Seq. 2018;172(5):1091–1107.e17. 10.1016/j.cell.2018.02.001
  6. Buenrostro JD, Wu B, Chang HY, et al. : ATAC-seq: A method for assaying chromatin accessibility genome-wide.Current Protocols in Molecular Biology. 2015;109(1):21.29.1–9. 10.1002/0471142727.mb2129s109
  7. Yan Guo,Fei Ye, Quanghu Sheng, et al. : Three-stage quality control strategies for DNA re-sequencing data. Briefings in Bioinformatics, Volume 15, Issue 6, November 2014, Pages 879–889, https://doi.org/10.1093/bib/bbt069
  8. Sims D.et al.  . (2014) Sequencing depth and coverage: key considerations in genomic analyses. Rev. Genet., 15, 121–132
  9. Ross M.G. et al.  . (2013) Characterizing and measuring bias in sequence data. Genome Biol., 14, R51.
  10. Joseph Brown, et al. : FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics, Volume 33, Issue 19, 01 October 2017, Pages 3137–3139, https://doi.org/10.1093/bioinformatics/btx373
  11. Konstantin Okonechnikov, et al. : Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics, Volume 32, Issue 2, 15 January 2016, Pages 292–294, https://doi.org/10.1093/bioinformatics/btv566
  12. Wang L, Wang S, Li W (August 2012). “RSeQC: quality control of RNA-seq experiments”. Bioinformatics. 28(16): 2184–5. doi:1093/bioinformatics/bts356
rasa

Share
Published by
rasa

Recent Posts

Unlocking the Secrets of Molecular Evolution with Modeltest 3.7

Unlocking the Secrets of Molecular Evolution with Modeltest 3.7 Modeltest is a software tool for…

1 year ago

Accelerating Vaccine Development with Insilico Techniques

Vaccines have been one of the most effective tools in the fight against infectious diseases,…

1 year ago

Genome annotation

DNA annotation or genome annotation identifies genes and all of the coding regions in a…

1 year ago

Primer designing by Primer-Blast

Primer Designing By Primer Blast Primers serve a crucial function in identifying the target area…

1 year ago