Computer Aided Drug Discovery

The Limitations of AlphaFold2

DeepMind’s artificial intelligence system AlphaFold2, predicts the 3D structure of a protein based on its amino acid sequence. It looks to be a 50-year-old answer to the problem of protein folding.

DeepMind’s AlphaFold1 was the first in the series, and it scored high in the 13th Critical Assessment of Protein Structure Prediction (Deepmind’s first effort at CASP) in 2018. Even though it achieved a never-attained GDT score of around 58, it was not regarded to have solved the protein folding problem. GDT, or global distance test, is a test that determines how similar a computational program predicted structure is to the structure determined in a lab experiment, with 100 being a perfect match. CASP had set a GDT score criterion of 90 for protein prediction and according to Professor Moult (co-Founder of CASP), any system that achieves this score is informally deemed to be competitive with experimental results.

 

AlphaFold2 entered CASP14 in November 2020 and was again the top-ranked protein structure prediction technique by a wide margin, producing highly accurate predictions. Across all targets, it received a median score of 92.4 GDT. This suggests that the average error (RMSD) of our estimates is about 1.6 Angstroms.

 

AlphaFold2 is created using a machine learning approach that includes physical and biological understanding of protein structure and is based on a trained dataset. The trained dataset was created using publicly available data, which included 170,000 protein structures from the Protein Data Bank as well as massive databases of protein sequences from unknown structures. The method searches many databases of protein sequences for the supplied amino acid sequence and creates a multiple sequence alignment (MSA). Simply put, an MSA detects sequences that are similar but not identical to those found in living creatures.

 

Despite its popularity, AlphaFold2 has flaws. One of the most important things to remember when using a machine learning model is that, while the model may have been created with a specific goal in mind, such as predicting the structure of an individual protein chain based on its sequence, the nature of the training data always determines what the model does. In this scenario, AlphaFold predicts how protein chains would appear if they were located in the PDB (protein structure database), and it’s crucial to note that many of these structures aren’t truly the folded state of a single protein when using AlphaFold2. Also, while AlphaFold can predict individual protein structures, it can’t tell you anything about multiprotein complexes, protein-DNA interactions, protein-small molecule interactions, and other dynamics that are critical to understanding in many biomedical applications.

Check out our training programs

rasa

Recent Posts

Unlocking the Secrets of Molecular Evolution with Modeltest 3.7

Unlocking the Secrets of Molecular Evolution with Modeltest 3.7 Modeltest is a software tool for…

1 year ago

Accelerating Vaccine Development with Insilico Techniques

Vaccines have been one of the most effective tools in the fight against infectious diseases,…

1 year ago

Genome annotation

DNA annotation or genome annotation identifies genes and all of the coding regions in a…

1 year ago

Primer designing by Primer-Blast

Primer Designing By Primer Blast Primers serve a crucial function in identifying the target area…

1 year ago

Next generation sequencing quality control

High-throughput sequencing (HTS) is a powerful discovery to screen for non-specific germ line variants, somatic…

1 year ago