Special report AI can study chemical molecules in ways scientists can't comprehend, automatically predicting complex protein structures and designing new drugs, despite having no real understanding of science.
The power to design new drugs at scale is no longer limited to Big Pharma. Startups armed with the right algorithms, data, and compute can invent tens of thousands of molecules in just a few hours. New machine learning architectures, including transformers, are automating parts of the design process, helping scientists develop new drugs for difficult diseases like Alzheimer's, cancer, or rare genetic conditions.
In 2017, researchers at Google came up with a method to build increasingly bigger and more powerful neural networks. Today, transformer-based models are behind some of the largest AI systems and typically learn patterns from vast amounts of text. They're versatile and can process different forms of language from code to ancient scripts scribbled thousands of years ago.
These systems are also useful in biology since proteins can be encoded as text too, Nadav Brandes, a postdoc at the University of California, San Francisco, studying bioinformatics, told the The Register. These complex molecules are made up of about 20 different amino acids; each building block can be represented with letters. Using this analogy, Brandes said proteins can be thought of as words, multiple proteins like sentences. But the vocabulary and grammar of these structures aren't comprehensible to humans like natural language is.
Transformers, however, are able to glean information from what appears to be gobbledygook. "Transformers can process all the molecules at different positions at the same time and capture relationships between them over longer distances," he said.
"They're also easier and more efficient to train, and we can scale them up to larger datasets."
When models are fed hundreds of thousands of protein sequences, they can do all sorts of things that would normally take scientists a long time, like mapping their shapes or predicting the effects of genetic mutations.
AlphaFold, developed by researchers at DeepMind, for example, learned how to plot the position of amino acids in a protein it has never seen in just minutes or hours. That's an impossible task for structural biologists; it often takes them years of extensive lab experiments to accurately map their jumbled curly ribbon-like shapes.
Knowing a protein's structure is critical for drug design. Scientists use this information to understand its function in the human body. Proteins interact with other molecules to perform vital tasks such as repairing cells or moving muscles. They have a unique binding site on their surface, where they can connect to other molecules and carry out their particular task.
Drugs are designed to latch onto these binding sites, preventing the sites from working with other molecules to carry out pathological functions that cause disease, like helping cancerous tumor cells grow.
AlphaFold has generated a wealth of information; its database is teeming with nearly a million squiggly protein structures found in all sorts of living organisms from animals, insects, plants, bacteria, and viruses.
If its predictions are indeed accurate, scientists will have leapfrogged years of physical experiments needed to discover their structures. It also means they'll be able to invent new drugs to target diseases that weren't within reach before.
Researchers from Insilico Medicine, a startup founded in 2014 and based in Hong Kong and New York, are using AlphaFold to do just that. A team claimed it was the first "to identify a confirmed hit for a novel target in early drug discovery" for an unverified protein.
Starting with an AlphaFold-predicted structure of CDK20, a protein involved in cell growth, the team generated 8918 drug candidates using machine learning to treat liver cancer.
Seven small molecules were synthesized and tested in lab experiments to see how strongly they interacted with the protein's binding site. One ended up being a "weak hit," Alex Zhavoronkov, Insilico's founder and CEO, told The Reg. "That means it looks somewhat promising, but it's not going to be a drug right away."
The whole process, from start to finish, took just 30 days.
It's too early to tell if AlphaFold was useful in this instance; Insilico will have to refine its search further, looking for more molecules that can lock onto the target more effectively. "The team is still working on this project; it's still in the hit identification process, and we will further progress it towards lead optimization, pre-clinical candidate, and hopefully clinical studies if everything goes smoothly. At the same time, more mechanistic studies for this target are ongoing," Zhavoronkov said.
Even if AlphaFold's predictions are accurate, they aren't always helpful to drug designers. They don't model how a protein's binding site changes shape when it interacts with a small molecule candidate, its something that developers have to figure out on their own using complicated physics-based simulations.
Even if Insilico is far away from a viable new drug in this experiment, it proved AlphaFold's predictions can be used by drug companies. "It shows it's not just a prototype. It's production ready," Zhavoronkov added.
Scientists can start plugging protein structures predicted by DeepMind's software into their own machine learning models and start creating new molecules to target protein structures yet to be experimentally verified.
DeepMind's co-founder and CEO, Demis Hassabis, knew AlphaFold would be commercially valuable. Last year, he spun out a separate startup, Isomorphic Labs, to develop new drugs using AlphaFold's knowledge of protein folding.
"Isomorphic's mission could not be a more important one: to use AI to accelerate drug discovery, and ultimately, find cures for some of humanity's most devastating diseases," he said at the time.
Transformers also have another trick up their sleeves: they can recognize and predict properties in data they haven't explicitly seen before. At Absci, a public drug and target discovery company founded in 2011, researchers are building a model to automatically predict whether an AI-generated antibody will be rejected by a patient's immune system without training on any clinical data.
Antibodies are a type of protein produced by our immune systems. They form naturally in our bodies to fight infections from foreign viruses or bacteria. Antibodies prevent us from getting sick by binding with enemy proteins, blocking them from infecting our cells. If AI can generate novel antibodies, scientists can develop new therapeutics and vaccines.
These made-up proteins, however, must be chemically stable and must be accepted by the body's immune system. If they're rejected, they risk being identified as foreign molecules themselves. They'll be attacked by the body's natural defense mechanisms which could lead to an adverse reaction, making them unsuitable as medical treatments.
"Biological drug discovery is hard, there are more antibody variants that are possible to create than there are atoms in the universe," Joshua Meier, Absci's lead AI scientist, told The Register.
"But only a small fraction are actually biologically viable."
Absci is building a transformer model to narrow that search down, selecting only the most promising variants that seem less likely to be rejected by the immune system.
"It sees hundreds of millions of antibody sequences. And then from there, we can present the model with a new antibody we're designing, and we can ask it, what is the natural order of this?" Meier explained.
The system calculates a "naturalness" score, comparing the structure of the artificially designed antibody to natural ones it has seen during training. Absci's training data includes antibodies from various organisms, from humans to llamas. If it appears natural, it'll probably be less likely to trigger the immune system's defenses, he reckons.
Transformers are also at play when Absci scientists want to see which antibodies will bind to their target proteins most strongly. Unlike small molecule drugs, Absci chemists don't synthesize the antibodies themselves. Instead, they're grown inside genetically engineered E coli bacteria cells.
DNA contains blueprint instructions on how cells can produce proteins. The company works backwards, figuring out the corresponding DNA sequences for its antibody designs. These DNA sequences are then inserted into the bacteria so the cells will generate the antibody designed by its software.
Biologists can then test these new strains and see if they're effective against a particular protein. "We do this on basically a cell-by-cell basis. Every cell is getting a different sequence of DNA corresponding to a different variant of the antibody, so every bacteria cell ends up producing a different antibody," Meier said.
The company declined to talk about potential antibody treatments under development, but told us it managed to generate candidates that may be more effective at treating breast cancer than existing treatments like Herceptin.
Training a neural network to generate drugs is easy. Trying to figure out how to make those molecules is hard. Pharmaceutical companies can't test each candidate; the lab experiments would be too time consuming and expensive.
Besides, even if they know what ingredients go into cooking a particular drug, they don't always know its recipe. Structural chemists are called in to probe and adjust the structure of the computer-designed molecules, changing it to something they reckon can be synthesized.
Exscientia, a pharmaceutical company founded in 2012 and headquartered in the UK, is developing transformer models to automate this step. The goal is to have a working system capable of ingesting made-up molecules as input, and spitting out the chemical reactions needed to make the molecules as output.
"You need to know not just that a reaction is possible in theory. You also need to know, based on the whole molecule, what's the likelihood this is actually going to work? What's the possible yield?," Adrian Schreyer, the company's VP of AI technology, told The Register.
The process of working backwards, deconstructing molecules into their constituent building blocks is known as retrosynthesis. Transformers are well-adapted for this task, said Ben Suutari, a senior AI research scientist. Given a chemical sequence of reactions, engineers can cover up various steps and ask the model to fill in the blanks.
A similar method is used to teach language models to autocomplete text. For example, in the sentence, "the cat sat on the __", the system is taught to assign a higher score to the word "mat" rather than, say, "hat". Instead of learning the order of words, however, Exscientia's retrosynthesis model learns the order of chemical reactions needed to make a finished molecule.
Another way to teach the algorithm is to scramble the answer and get it to reconstruct the data, we're told. It can then operate in reverse, dissecting a molecule to lay out a viable pathway to synthesizing drugs never created before.
The quest for the perfect AI-designed drug capable of treating or maybe even curing diseases is long, hard, and filled with all sorts of technical and regulatory roadblocks. Transformers are only a small part of a drug designer's toolkit, there are also several other machine learning models involved in the process. Sometimes these models even compete with each other, generating batches of molecules with different structures.
Transformers don't always come out on top, Insilico's Zhavoronkov told us. The startup's generation system, Chemistry42, currently consists of 32 different models, and includes generative adversarial networks and evolutionary algorithms.
"You want to have many different approaches competing with each other. If you just use transformers, it's very rare that it will perform best in every application. So for every target, you want to have that diversity. Very often people think transformer networks are the answer to everything and they're never the answer to everything," Zhavoronkov said.
AI may help scientists bring new drugs to market at a faster rate, but the drug discovery process can't be completely automated. It requires careful cooperation between humans and machines. New molecules still need to be examined by lab experiments and tested on patients before they're considered safe. Companies like Insilco and Exscientia already have drug candidates designed by their proprietary AI software in clinical trials.
The day an AI-designed drug can be given to a real patient might not be so far away. But it's difficult to predict when that day will arrive. Even if these new drugs are granted approval from regulators, many of these companies still need to find buyers for their IP if they haven't already partnered with a bigger pharmaceutical company to manufacture and sell their products at scale.
Here the incentives become murkier; they'll have to convince Big Pharma to make their medication not just because it'll save lives but because it'll make money. ®
Mozilla on Wednesday launched a Developer Preview program to solicit feedback on Firefox extensions that implement Manifest v3, a Google-backed revision of browser extension architecture.
Mozilla last year said it intended to support MV3 in Firefox extensions, though with some differences. Its implementation of the WebExtensions API in Firefox has now incorporated enough of MV3 plumbing that developers can set the appropriate browser flags and experiment with MV3 extensions in Firefox v101, now in beta and due for release at the end of May.
Google Chrome is expected to stop supporting extensions created under the old MV2 specification in about a year, June 2023. And given Chrome's share of the browser market – about 64 per cent currently – extension developers will want to have updated their code by then and to have accounted for how MV3 works – or doesn't – in different browsers.
The Canadian government has joined many of its allies and banned the use of Huawei and ZTE tech in its 5G networks, as part of a new telecommunications security framework.
“The Government is committed to maximizing the social and economic benefits of 5G and access to telecommunications services writ large, but not at the expense of security,” stated the Government of Canada.
Companies using equipment or managed services from the two Chinese companies have been until 28 June 2024 to stop operating or remove the equipment.
India has slightly softened its controversial new reporting requirements for information security incidents and made it plain they apply to multinational companies.
The rules were announced with little advance warning in late April and quickly attracted criticism from industry on grounds including the requirement to report 22 different types of incident within six hours, a requirement to register personal details of individual VPN users, and retention of many log files for 180 days.
India’s government yesterday responded by publishing an FAQ [PDF] about the new rules.
Lenovo has halved its range of portable workstations.
The Chinese PC giant this week announced the ThinkPad P16. The loved-by-some ThinkPad P15 and P17 are to be retired, The Register has confirmed.
The P16 machine runs Intel 12th Gen HX CPUs, but only up to the i7 models – so maxes out at 14 cores and 4.8GHz clock speed. The laptop is certified to run Red Hat Enterprise Linux, and can ship with that, Ubuntu, and Windows 11 or 10. The latter is pre-installed as a downgrade right under Windows 11.
The US Justice Department has directed prosecutors not to charge "good-faith security researchers" with violating the Computer Fraud and Abuse Act (CFAA) if their reasons for hacking are ethical — things like bug hunting, responsible vulnerability disclosure, or above-board penetration testing.
Good-faith, according to the policy [PDF], means using a computer "solely for purposes of good-faith testing, investigation, and/or correction of a security flaw or vulnerability."
Additionally, this activity must be "carried out in a manner designed to avoid any harm to individuals or the public, and where the information derived from the activity is used primarily to promote the security or safety of the class of devices, machines, or online services to which the accessed computer belongs, or those who use such devices, machines, or online services."
Intel this week unveiled a $700 million sustainability initiative to try innovative liquid and immersion cooling technologies to the datacenter.
The project will see Intel construct a 200,000-square-foot "mega lab" approximately 20 miles west of Portland at its Hillsboro campus, where the chipmaker will qualify, test, and demo its expansive — and power hungry — datacenter portfolio using a variety of cooling tech.
Alongside the lab, the x86 giant unveiled an open reference design for immersion cooling systems for its chips that is being developed by Intel Taiwan. The chip giant is hoping to bring other Taiwanese manufacturers into the fold and it'll then be rolled out globally.
The US government has recovered over $15 million in proceeds from the 3ve digital advertising fraud operation that cost businesses more than $29 million for ads that were never viewed.
"This forfeiture is the largest international cybercrime recovery in the history of the Eastern District of New York," US Attorney Breon Peace said in a statement.
The action, Peace added, "sends a powerful message to those involved in cyber fraud that there are no boundaries to prosecuting these bad actors and locating their ill-gotten assets wherever they are in the world."
A bipartisan group of US lawmakers has proposed legislation that would likely force Alphabet's Google, Meta's Facebook, and Amazon to divest portions of their ad businesses.
The bill, called the Competition and Transparency in Digital Advertising Act (CTDA), was introduced on Thursday by Senator Mike Lee (R-UT), with the participation of Senators Amy Klobuchar (D-MN), Ted Cruz (R-TX), and Richard Blumenthal (D-CT).
The bill would prevent large ad companies from participating on different sides of the ad transaction chain. Large ad firms could operate supply-side brokers selling publisher ad space, demand-side brokers selling ads, or ad exchanges connecting buyers and sellers – but not more than one of these.
Smartphone markets the world over are in decline, but that news doesn't appear to have reached North America, where the market grew by 4 percent in the first quarter of 2022.
Tech market analytics firm Canalys reported that smartphone manufacturers shipped a total of 39m units in North America in Q1 2022, and most of it was driven by Apple, which saw 19 percent growth in Q1 to reach 51 percent of the smartphone market in the US, Canada and Mexico.
Apple may lead the quarter in terms of shipments and market share, but Google was the growth leader: It added 380 percent to its North American market share from Q1 2021 to Q1 2022. Still, that only brought it to 3 percent of the market, putting it in fifth place.
With Russia cut off from foreign processor makers Intel and AMD, the country has been scrambling to switch to more local CPUs and components.
Russia's latest step in securing supply chains for new computers comes in the form of a newly released desktop motherboard designed to support x86-compatible CPUs made by Chinese chip designer Zhaoxin, which is a joint venture between Taiwan's Via Technologies and the Shanghai municipal government.
The new motherboard, called MBX-Z60A, is made by electronics manufacturer Dannie, which has headquarters in Russia and China, according to a machine translation of an article published last week by Russian-language news aggregator Habr.
You can imagine the sighs of relief all round in Redmond, Washington this week as Acer launched its new TravelMate range, which has Microsoft's Pluton silicon built-in.
The Register - Independent news and views for the tech community. Part of Situation Publishing
Biting the hand that feeds IT © 1998–2022