AI Technology & Industry Review
56 Temperance St, #700 Toronto, ON M5H 3V5
In the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, a Monash University research team presents EcoFormer, an attention mechanism with linear complexity that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.
While transformer architectures have achieved remarkable success in recent years thanks to their impressive representational power, their quadratic complexity entails a prohibitively high energy consumption that hinders their deployment in many real-life applications, especially on resource-constrained edge devices.
A Monash University research team addresses this issue in the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, proposing an attention mechanism with linear complexity — EcoFormer — that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.
The team summarizes their main contributions as follows:
The basic idea informing this work is to reduce attention’s high cost by applying binary quantization to the kernel embeddings to replace energy-expensive multiplications with energy-efficient bit-wise operations. The researchers note however that conventional binary quantization methods focus only on minimizing the quantization error between the full-precision and binary values, which fails to preserve the pairwise semantic similarity among attention’s tokens and thus negatively impacts performance.
To mitigate this issue, the team introduces a novel binarization method that uses kernelized hashing with a Gaussian Radial Basis Function (RBF) to map the original high-dimensional query/key pairs to low-dimensional similarity-preserving binary codes. EcoFormer effectively leverages this binarization method to maintain semantic similarity in attention while approximating the self-attention in linear time with a lower energy cost.
In their empirical study, the team compared the proposed EcoFormer with standard multi-head self-attention (MSA) on ImageNet1K. The results show that EcoFormer can reduce energy consumption by 73 percent while incurring only a 0.33 percent performance tradeoff.
Overall, the proposed EcoFormer energy-saving attention mechanism with linear complexity represents a promising approach for alleviating the cost bottleneck that has limited the deployment of transformer models. In future work, the team plans to explore binarizing transformers’ value vectors in attention, multi-layer perceptrons and non-linearities to further reduce energy cost; and to extend EcoFormer to NLP tasks such as machine translation and speech analysis. The Code will be available on the project’s GitHub. The paper EcoFormer: Energy-Saving Attention with Linear Complexity is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Machine Intelligence | Technology & Industry | Information & Analysis
Your email address will not be published. Required fields are marked *
Notify me of follow-up comments by email.
Notify me of new posts by email.
56 Temperance St, #700 Toronto, ON M5H 3V5
One Broadway, 14th Floor, Cambridge, MA 02142
75 E Santa Clara St, 6th Floor, San Jose, CA 95113
Contact Us @ global.general@jiqizhixin.com