Blog
Challenges with webservers for using NGS
- December 15, 2022
- Posted by: rasa
- Category: Uncategorized
Next-Generation Sequencing (NGS) has emerged as a standard technology for multiple high-
throughput molecular profiling assays. As these technologies are capable of generating
tremendous amounts of information at base-level resolution, within relatively short time, and at
low cost, which has revolutionized the paradigms followed in the life sciences in general. The
appropriate selection of the right approaches to the analysis of the data is therefore a key
discipline of this new era.
A very basic decision is whether to build the data processing pipelines up from scratch or
whether to leverage one of the existing frameworks for large-scale NGS analysis pipelining. The
open source platforms for complex NGS data analyses operated on cloud-based compute clusters
linked to a front-end web server which enables facilitated user access. The unique challenge, but
also the big chance in the NGS analysis field lies in the tremendous size of the data for every
single sample analyzed. The raw data typically range in dozens of gigabytes per sample,
depending on the application. For whole-genome sequencing, the size of the raw data can be
even up to 250 GB. Given sufficient computational resources, the overall workflows can be
streamlined and highly accelerated by establishing centralized standard pipelines through which
all samples analyzed at an academic institution are processed. State-of-the-art version control on
the underlying pipeline scripts greatly improves the reproducibility of NGS-based research
results in such an environment. A commonly used system for both software development and
version control is git: the software version used for a particular analysis can, for instance, be
controlled by tracking the ID of the latest git commit before the analysis has started.
Other challenges includes a huge number of bioinformatics tools for a wide range of uses exist,
therefore it is challenging to design an analysis pipeline. Also, NGS analysis is computationally
intensive, requiring expensive infrastructure, and many medical and research centres do not have
adequate high performance computing facilities and cloud computing is not always an option due
to privacy and ownership issues. Finally, the interpretation of the results is not trivial and most
available pipelines lack the utilities to favour this crucial step.
Reference
Kulkarni P, Frommolt P. Challenges in the Setup of Large-scale Next-Generation Sequencing
Analysis Workflows. Comput Struct Biotechnol J. 2017 Oct 25;15:471-477. doi:
10.1016/j.csbj.2017.10.001. PMID: 29158876; PMCID: PMC5683667.