Close Menu
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

Colle AI Introduces Creator-Centered Design Layers for Actual-Time Digital Asset Improvement

October 15, 2025

EarthCam Premieres Ninth-Era Procore Integration with AI Visible Intelligence

October 15, 2025

Tanium Ushers in New Period of Autonomous Endpoint Administration Powered by Agentic AI and Actual-time Intelligence

October 15, 2025
Facebook X (Twitter) Instagram
Smart Homez™
Facebook X (Twitter) Instagram Pinterest YouTube LinkedIn TikTok
SUBSCRIBE
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics
Smart Homez™
Home»Deep Learning»Microsoft AI Releases LLMLingua: A Distinctive Fast Compression Method that Compresses Prompts for Accelerated Inference of Massive Language Fashions (LLMs)
Deep Learning

Microsoft AI Releases LLMLingua: A Distinctive Fast Compression Method that Compresses Prompts for Accelerated Inference of Massive Language Fashions (LLMs)

By December 13, 2023Updated:December 13, 2023No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Reddit WhatsApp Email
Microsoft AI Releases LLMLingua: A Distinctive Fast Compression Method that Compresses Prompts for Accelerated Inference of Massive Language Fashions (LLMs)
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Massive Language Fashions (LLMs), attributable to their sturdy generalization and reasoning powers, have considerably uplifted the Synthetic Intelligence (AI) group. These fashions have proven to be remarkably succesful and have showcased the capabilities of Pure Language Processing (NLP), Pure Language Technology (NLG), Pc Imaginative and prescient, and many others. Nevertheless, newer developments, together with in-context studying (ICL) and chain-of-thought (CoT) prompting, have resulted within the deployment of longer prompts, typically much more than tens of 1000’s of tokens. This presents issues for mannequin inference by way of cost-effectiveness and computational effectivity.

To beat these challenges, a crew of researchers from Microsoft Company has launched LLMLingua, a novel coarse-to-fine fast compression method. LLMLingua has been developed with the first goal of minimizing bills associated to processing prolonged prompts and expediting mannequin inference. To do that, LLMLingua makes use of a number of important methods, that are as follows.

  1. Funds Controller: A dynamic funds controller has been created to control how compression ratios are distributed among the many varied components of the unique prompts. This makes positive that the prompts’ semantic integrity is preserved even at giant compression ratios.
  1. Token-level Iterative Compression Algorithm: An algorithm for token-level iterative compression has been built-in into LLMLingua. This system allows extra subtle compression by capturing the interdependence between compressed parts whereas sustaining essential details about the immediate.
  1. Instruction Tuning-Primarily based Strategy: The crew has prompt an instruction tuning-based strategy to cope with the issue of distribution misalignment amongst language fashions. Aligning the language mannequin distribution improves compatibility between the small language mannequin utilized for fast compression and the meant LLM.

The crew has carried out the evaluation and the experiments utilizing 4 datasets from completely different circumstances to validate the usefulness of LLMLingua. The datasets are GSM8K and BBH for reasoning, ShareGPT for dialog, and Arxiv-March23 for summarization. The outcomes have proven that the prompt strategy achieves state-of-the-art efficiency in every of those circumstances. The outcomes even demonstrated that LLMLingua permits important compression of as much as 20 instances whereas sacrificing little or no by way of efficiency.

The small language mannequin used within the experiments was LLaMA-7B, and the closed LLM was GPT-3.5-Turbo-0301. LLMLingua outperformed earlier compression methods by retaining reasoning, summarising, and discourse expertise even at a most compression ratio of 20x, which portrays resilience, financial system, efficacy, and recoverability.

The efficacy of LLMLingua has been noticed throughout a variety of closed LLMs and small language fashions. LLMLingua demonstrated good efficiency outcomes, roughly matching bigger fashions when using GPT-2-small. It has additionally proven to achieve success with sturdy LLMs, outperforming anticipated fast outcomes.

The recoverability of LLMLingua is one noteworthy side as GPT-4 successfully retrieved necessary reasoning info from the entire nine-step CoT prompting when it was used to revive compressed prompts, holding the unique prompts’ that means and resemblance. This operate ensures recoverability and retains essential info even after translation, including to LLMLingua’s total impressiveness.

In conclusion, LLMLingua has offered a complete answer to the difficulties introduced by lengthy prompts in LLM functions. The strategy demonstrates glorious efficiency and presents a helpful means to enhance the effectiveness and affordability of LLM-based functions.


Try the Paper, Github, and Weblog. All credit score for this analysis goes to the researchers of this venture. Additionally, don’t overlook to hitch our 33k+ ML SubReddit, 41k+ Fb Neighborhood, Discord Channel, and Electronic mail E-newsletter, the place we share the most recent AI analysis information, cool AI tasks, and extra.

When you like our work, you’ll love our publication..



Tanya Malhotra is a closing 12 months undergrad from the College of Petroleum & Vitality Research, Dehradun, pursuing BTech in Pc Science Engineering with a specialization in Synthetic Intelligence and Machine Studying.
She is a Knowledge Science fanatic with good analytical and significant pondering, together with an ardent curiosity in buying new expertise, main teams, and managing work in an organized method.


🐝 [Free Webinar] Alexa, Improve my App: Integrating Voice AI into Your Technique (Dec 15 2023)

Related Posts

Microsoft Analysis Releases Skala: a Deep-Studying Alternate–Correlation Practical Focusing on Hybrid-Stage Accuracy at Semi-Native Value

October 10, 2025

Deep Studying Framework Showdown: PyTorch vs TensorFlow in 2025

August 20, 2025

Google AI Releases DeepPolisher: A New Deep Studying Software that Improves the Accuracy of Genome Assemblies by Exactly Correcting Base-Degree Errors

August 7, 2025
Misa
Trending
Machine-Learning

Colle AI Introduces Creator-Centered Design Layers for Actual-Time Digital Asset Improvement

By Editorial TeamOctober 15, 20250

The brand new adaptive system enhances collaboration, personalization, and NFT design precision throughout multichain networks…

EarthCam Premieres Ninth-Era Procore Integration with AI Visible Intelligence

October 15, 2025

Tanium Ushers in New Period of Autonomous Endpoint Administration Powered by Agentic AI and Actual-time Intelligence

October 15, 2025

XConn Applied sciences and MemVerge Show CXL Reminiscence Pool for KV Cache utilizing NVIDIA Dynamo for breakthrough AI workload efficiency at 2025 OCP World Summit

October 14, 2025
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Our Picks

Colle AI Introduces Creator-Centered Design Layers for Actual-Time Digital Asset Improvement

October 15, 2025

EarthCam Premieres Ninth-Era Procore Integration with AI Visible Intelligence

October 15, 2025

Tanium Ushers in New Period of Autonomous Endpoint Administration Powered by Agentic AI and Actual-time Intelligence

October 15, 2025

XConn Applied sciences and MemVerge Show CXL Reminiscence Pool for KV Cache utilizing NVIDIA Dynamo for breakthrough AI workload efficiency at 2025 OCP World Summit

October 14, 2025

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

The Ai Today™ Magazine is the first in the middle east that gives the latest developments and innovations in the field of AI. We provide in-depth articles and analysis on the latest research and technologies in AI, as well as interviews with experts and thought leaders in the field. In addition, The Ai Today™ Magazine provides a platform for researchers and practitioners to share their work and ideas with a wider audience, help readers stay informed and engaged with the latest developments in the field, and provide valuable insights and perspectives on the future of AI.

Our Picks

Colle AI Introduces Creator-Centered Design Layers for Actual-Time Digital Asset Improvement

October 15, 2025

EarthCam Premieres Ninth-Era Procore Integration with AI Visible Intelligence

October 15, 2025

Tanium Ushers in New Period of Autonomous Endpoint Administration Powered by Agentic AI and Actual-time Intelligence

October 15, 2025
Trending

XConn Applied sciences and MemVerge Show CXL Reminiscence Pool for KV Cache utilizing NVIDIA Dynamo for breakthrough AI workload efficiency at 2025 OCP World Summit

October 14, 2025

AI-Powered Platform Revolutionizing Carbon Accounting Amid Surging World Laws and Market Development

October 14, 2025

Onspring Ushers within the Subsequent Period of GRC with New Clever AI Capabilities

October 14, 2025
Facebook X (Twitter) Instagram YouTube LinkedIn TikTok
  • About Us
  • Advertising Solutions
  • Privacy Policy
  • Terms
  • Podcast
Copyright © The Ai Today™ , All right reserved.

Type above and press Enter to search. Press Esc to cancel.