Alphabet’s DeepMind has used its AI system to uncover a five-decade-old mystery in biology, the company announced today, using AlphaFold to help understand protein behavior. The company became known for its neural network developments, which demonstrated their human resilience in playing chess, go and shogi.
Google acquired DeepMind in 2014 – not without some controversy – and it became a subsidiary of Alphabet in 2015. AlphaGo, its go AI, beat the human world champion player the following year, while AlphaZero went on to show how reinforcement learning can be used to train the AI effectively by playing against itself.
AlphaFold, however, tackles a completely different challenge. The “protein folding problem” is an acronym for attempts to understand how the amino acid sequence in a protein forms its 3D atomic structure. That shape is guided by the underlying folding code that takes into account thermodynamics and interatomic forces; the prediction of protein structure by attempting to understand the natural structure of a protein from the animo acid sequence; and the kinetics of how the fold itself occurs.
While it sounds esoteric, understanding how amino acids work is believed to be key to a number of challenges in biology. That includes everything from tackling human diseases to wider applications such as enzymes that break down plastic or other waste.
The goal was to come up with a computational method to predict folds, rather than an experimental method, that could be faster and more efficient. “A major challenge, however, is that the number of ways a protein could theoretically fold before settling into its final 3D structure is astronomical,” DeepMind emphasizes.
In 1994 a challenge was set up, CASP, to work out predictive methods against each other in the hunt for a computational solution. The measure of success is the so-called Global Distance Test, or GDT, which is based on the percentage of animo acid residues predicted within a threshold distance of their correct position. It is scored from 0-100, with the unofficial benchmark being just over 90 CCT, similar to experimental findings.
Today, says DeepMind, its try in the fourteenth challenge – CASP14 – scored 92.4 CCT. “This means that our predictions have an average error (RMSD) of about 1.6 Angstroms,” the company says, “which is comparable to the width of an atom (or 0.1 of a nanometer).”
It’s a significant leap from DeepMind’s 2018 submission – the last CASP to run – in which the previous AlphaFold generation failed to reach 60 CCT.
“For the latest version of AlphaFold, used at CASP14, we created an attention-based neural network system, trained end-to-end, that tries to interpret the structure of this graph, while reasoning about the implicit graph that it represents. building is, ”DeepMind explains. “It uses evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph.”
DeepMind uses Google’s latest generation of TPU neural processing cores – about 128 of them – with about 170,000 protein structures from public databases along with other protein sequence databases. It took “a few weeks” to crunch, the company says. Next, the hope is to gain scalable access to the system for outside researchers, while applying the technology to better understand how protein structures affect specific diseases and affect drug development.