How the Authors Behind the Transformers Research Paper are Transforming the AI Startup Landscape

2022-09-24 06:42:34 By : Ms. Sales Team

The oft-quoted Hemingway adage “gradually, then suddenly” is fitting for all progress in machine learning. Most significant breakthroughs in AI research only appear important in hindsight. For instance, reinforcement learning and convolutional neural networks, which were developed conceptually in the 1960s and 1980s respectively, entered the mainstream much later. The compute and data that made these ideas usable are related to more recent innovations in modern hardware. There is but one important exception to this rule that was out of the ordinary the minute it was introduced—the concept of attention in neural networks. A bunch of Google researchers published a paper with the cryptic headline, ‘Attention is All You Need’ in 2017. 

The paper demonstrated that a transformer neural network used the “self-attention” technique to translate between English and French with more accuracy and used only a quarter of the training time than normally used by other neural nets. Transformers could look at all the elements that were a part of a sequence, mostly words, and could pay closer attention to them. Soon enough, the application of transformer architectures were found in most language tasks in AI/ML, belying their newness. From question answering to grammar correction—most benchmark tasks in natural language processing were covered by transformers. The rise of the transformer was following the same trajectory as the usage in convolutional neural networks which grew after the ImageNet competition held in 2012. 

To gauge the impact of transformer architectures in brief is tough considering their ubiquitous applications. Most influential large language models today like GPT, BERT, GPT-2 and GPT-3, are all transformer models. Transformers aren’t limited to working with words. In fact, they can essentially be used to predict and analyse sequential data of all kinds. For example, the team of researchers at DeepMind who published the research on AlphaFold used a new transformer technique to generate predictions of how amino acid sequences fold into the 3D shapes of proteins. Owing to their accuracy, transformers can also be easily used in anomaly detection in a variety of industries like healthcare and finance. 

There is another peculiarity that sets the paper and its authors apart. Of the eight authors who wrote the paper, six have gone on to form AI- and crypto-related startups. What’s more—the startups have collectively raised more than USD 1 billion in venture capital money.

We're stoked to share that we've partnered with @deeplearninglabs to host three online hackathons🔥 Sign up for the first Cohere AI Hackathon held on August 19 – 21, and keep an eye on the following ones! More info 👉🏻https://t.co/WcaMKHbJOS#hackathon #ai #nlp

Conference, in-person (Bangalore) Machine Learning Developers Summit (MLDS) 2023 19-20th Jan, 2023

Conference, in-person (Bangalore) Data Engineering Summit (DES) 2023 21st Apr, 2023

Conference, in-person (Bangalore) MachineCon 2023 23rd Jun, 2023

Stay Connected with a larger ecosystem of data science and ML Professionals

Discover special offers, top stories, upcoming events, and more.

Stay up to date with our latest news, receive exclusive deals, and more.

© Analytics India Magazine Pvt Ltd 2022