Categories: BioPython

BIOPYTHON: LANGUAGE FOR BIO-PROGRAMMERS

Biopython is an open source application programming interface used by computational biologist and bioinformatician. Biopython is supported by Open Bioinformatics Foundation (OBF). It is the collection of Python tools, and it provides an online resource for modules, scripts, and web links for developers of Python based software for life science research. Python is an object oriented programming language that is widely used in computational molecular biology and bioinformatics. Python is easy to learn, has very clear syntax and can easily extended with module written in C, C++ or FORTANN. The main function of Biopython includes, ability to parse bioinformatics files into python utilizable data structure, it has support for online databases such as BLAST, ClustalW, FASTA, GenBank, Pubmed and Medline, Expasy files (like enzyme, Prodoc and Prosite), SCOP, InterPro, KEGG, Unigene and SwissProt. The Seq object is an important feature of Biopython; it is core representation of Biopython. With the addition of an alphabet and some biologically relevant key, it behaves like a Python string. Sequence can be annotated by using SeqRecord objects which extend a Seq object with the properties such as record name, identifier, and description and space for additional key/value terms. The SeqRecord can also have a list of SeqFeature objects which describe sub feature of sequence with their location and their own annotation.

There are different modules available in Biopython such as, Bio.SeqIO module provides a simple interface for reading and writing biological sequence file in various formats. This module interprets multiple sequence alignment formats as collection of equal length sequence. Similarly, Bio.AlignIO deals with file containing one or more sequence alignments represented as alignment objects. It uses the same set of functions for input and output as in Bio.SeqIO. Bio.Nexus supports phylogenetic tools using Nexus interface or Newick standard free format. Bio.Entrez module provides code to access NCBI over WWW. Bio.Blast can call the NCBI’s online BLAST server or local standalone installation, and includes parser for their XML output. Biopython has wrapper code for other command line tools, like ClustalW and EMBOSS. The Bio.pdb module can parse the PDB files and it has function related to macromolecular structure. Bio.PDB has been tested on nearly 5500 structures from PDB and all structures parsed correctly. Bio.Motif module provides support for sequence motif analysis (searching, comparing and de novo learning).

The major approach of Biopython is to focus on the development of the Python libraries, as opposed to ready to run programs. The advantages of this approach are; we can get flexible set of “small as possible” module that can be used in numerous ways, different parts can interoperate (no reading and writing of file formats), can utilize the power of a programming language, automation- a single program (written by you) can accomplish several different tasks, and allows you to take advantage of existing programs by executing them in your program.

References:
1.Cock, Peter JA, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg et al. “Biopython: freely available Python tools for computational molecular biology and bioinformatics.” Bioinformatics25, no. 11 (2009): 1422-1423.

rasa

Next Let’s break the sequence of protein and model the full structure. »

Previous « Force fields in Molecular Dynamics