Question: How to handle large amount of data?
- February 4, 2019
- Posted by: rasa
- Category: Bioinformatics
Phenomenal amount of biological data of various types used to generate everyday, which when organised properly will help researchers to better utilise their scientific potential and knowledge. Bioinformatics is an interdisciplinary field of science to retrieve, analyze, store, share and manipulate biological data using computational tools and techniques.
Bioinformatics finds its roots in conceptualising life as an information technology because of the fact that genes, which are basic physical and functional unit of heredity, can be explored as digital information. In this emerging digital-era, bioinformatics serves as a handy tool for scientific communities to ease with their design and practice of biological research. The primary goal of bioinformatics is to increase the understanding of biological processes by applying computational techniques.
Major areas of bioinformatics are:
1) High-throughput analysis of -omic datasets (Genomics, Proteomics, Transcriptomics, Metabolomics, and Glycomics).
2) Development and implementation of new algorithms, statistical methods, tools and softwares to manage various types of information efficiently.
Some of the applications where bioinformatics can be applied with different types of available data are:
1) Raw DNA sequences: To separate coding and non-coding regions,identifying introns and exons, gene-product prediction, DNA sequencing and forensic analysis.
2) Protein sequences: Multiple sequence alignment, identification of conserved sequence motifs, protein-protein interaction, networks and systems biology.
3) Macromolecular structure: Three-dimensional structure prediction and alignment, drug design, drug discovery, Molecular simulations (force-field calculations,molecular movements,docking predictions).
4) Genomes: Characterisation of repeats, structural assignments to genes, phylogenetic analysis,gene-expression analysis, genomic-scale censuses(characterisation of protein content, metabolic pathways), linkage analysis relating specific genes to diseases.
Some broadly used bioinformatic tools are ClustalX, Discovery Studio, BioEdit, IGB, MEGA for viewing/editing the alignment, BLAST, PSI-BLAST, FASTA, HHMER, DIAMOND for searching in protein and nucleotide databases and BLAT, GMAP, Splign, SIBsim4, DECIPHER for genomic analysis. These tools require data to be stored in a specific format like fasta, fastq, Genbank, etc. and are mostly written in programming languages like C, JAVA, PERL or python. Therefore, sound knowledge of these languages will allow users to customize these tools.
1). MA Sarwar, A Rehman, J Ferzund. Database Search, Alignment Viewer and Genomics Analysis Tools: Big Data for Bioinformatics. International Journal of Computer Science and Information Security (IJCSIS), Volume 2016:14 (317-328).
2). A Bartlett, B Penders, J Lewis. BMC Bioinformatics. Volume 2017:18(311-314).
3). N M Luscombe,D Greenbaum, M Gerstein. What is bioinformatics? An introduction and overview. Yearbook of Medical Informatics 2001.