Close Menu
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

Intelligo Launches the Intelligo Compliance MCP Server, Bringing Deterministic Due Diligence into Autonomous AI Funding Workflows

March 10, 2026

Tanium Earns 5-Star Score in 2026 CRN® Accomplice Program Information for the fifth Consecutive 12 months

March 9, 2026

Smartria Launches AI-Powered SmartReview and SmartAssist, Showcases New Capabilities at Future Proof Citywide

March 9, 2026
Facebook X (Twitter) Instagram
Smart Homez™
Facebook X (Twitter) Instagram Pinterest YouTube LinkedIn TikTok
SUBSCRIBE
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics
Smart Homez™
Home»Deep Learning»Zhejiang College Researchers Suggest Fuyou: A Low-Price Deep Studying Coaching Framework that Permits Environment friendly 100B Big Mannequin Wonderful-Tuning on a Low-Finish Server with a Low-Finish GPU and Restricted CPU Reminiscence Capability
Deep Learning

Zhejiang College Researchers Suggest Fuyou: A Low-Price Deep Studying Coaching Framework that Permits Environment friendly 100B Big Mannequin Wonderful-Tuning on a Low-Finish Server with a Low-Finish GPU and Restricted CPU Reminiscence Capability

By March 16, 2024Updated:March 16, 2024No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Reddit WhatsApp Email
Zhejiang College Researchers Suggest Fuyou: A Low-Price Deep Studying Coaching Framework that Permits Environment friendly 100B Big Mannequin Wonderful-Tuning on a Low-Finish Server with a Low-Finish GPU and Restricted CPU Reminiscence Capability
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


The appearance of huge language fashions (LLMs) has sparked a revolution in pure language processing, fascinating the world with their superior capabilities stemming from the large variety of parameters they make the most of. These LLMs, epitomized by the transformative energy of dense transformer fashions, haven’t solely damaged information in accuracy however have additionally turn out to be indispensable belongings in information administration duties. Not too long ago, the mannequin measurement of dense transformer fashions has grown from 1.5B (GPT-2) to 540B (PaLM), which reveals the evolution of those fashions in an unprecedented journey into the realm of linguistic mastery.

Whereas the potential of LLMs is simple, a crucial problem arises from their immense parameter sizes overwhelming even essentially the most highly effective GPUs, which presently peak at 80GB of reminiscence. When conducting stochastic gradient descent-based optimization, they have to be extra adequate to accommodate these huge parameters and their related optimizer states. To host such an enormous mannequin, one can combination system reminiscence from a number of GPUs, and it takes 32 NVIDIA A100 GPUs to suit a mannequin with 100 billion parameters for fine-tuning. Nevertheless, this strategy introduces prohibitive prices for many tutorial researchers, who at all times have a restricted finances for a lot of high-end GPU servers.

Researchers from Zhejiang College proposed Fuyou. This low-cost coaching framework allows environment friendly 100B big mannequin fine-tuning on a low-end server with a low-end GPU and restricted CPU reminiscence capability. It’s carried out on PyTorch, which is a well-liked deep-learning framework. In contrast with different fashions like ZeRO-Infinity, Fuyou can fine-tune 175B GPT-3 on a shopper GPU RTX 4090 with excessive GPU utilization, whereas ZeRO-Infinity fails to fine-tune. 

The main focus lies on integrating SSD-CPU communication as a pivotal optimization dimension, strategically harmonizing computation and information swapping to unlock the total potential of GPU utilization. This endeavor unfolds by three pioneering improvements:

  •  A synchronous out-of-core CPU optimizer that overlaps with backward propagation to maximise GPU utilization.
  • A GPU-CPU-SSD fully-pipelined activation swapping mechanism to permit for a considerably bigger mannequin fine-tuning.
  • An computerized activation swapping administration to routinely decide the optimum quantity of swapping activations to reduce the epoch time.

Within the dynamic realm of mannequin fine-tuning, Fuyou emerges as a powerhouse, delivering distinctive efficiency whether or not on the cutting-edge A100-80GB or the formidable 4090 in a commodity server. When fine-tuning a GPT-3 175B mannequin, Fuyou achieves 87 TFLOPS on 4090 and 172 TFLOPS on A100-80GB. Additionally, it reaches as much as 3.47×TFLOPS in comparison with ZeRO-Infinity when a GPT-3 13B mannequin is fine-tuned. To make the most of low cost SSDs in enhancing coaching throughput, the cost-effectiveness of Fuyou with Megatron-LM is in contrast on DGX-2 nodes utilizing tensor parallelism. Throughput is in contrast over the whole worth of GPUs6 and SSDs in a server the place Fuyou achieves at most 1.70× cost-effectiveness over Megatron-LM.

In conclusion, this paper proposed Fuyou, a low-cost coaching framework that permits environment friendly 100B big mannequin fine-tuning on a low-end server with a low-end GPU and restricted CPU reminiscence capability. It’s carried out on PyTorch. It achieves 87 and 172 TFLOPS when fine-tuning GPT-3 175B. In addition to, it reaches as much as 3.42× and 6.73× TFLOPS in comparison with ZeRO-Infinity and Colossal-AI when fine-tuning GPT-3 13B. Additionally, Fuyou achieves at most 1.70× cost-effectiveness over Megatron-LM.


Try the Paper. All credit score for this analysis goes to the researchers of this undertaking. Additionally, don’t neglect to comply with us on Twitter. Be a part of our Telegram Channel, Discord Channel, and LinkedIn Group.

In the event you like our work, you’ll love our e-newsletter..

Don’t Overlook to affix our 38k+ ML SubReddit



Sajjad Ansari is a ultimate 12 months undergraduate from IIT Kharagpur. As a Tech fanatic, he delves into the sensible purposes of AI with a deal with understanding the affect of AI applied sciences and their real-world implications. He goals to articulate advanced AI ideas in a transparent and accessible method.


🐝 Be a part of the Quickest Rising AI Analysis E-newsletter Learn by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and lots of others…



Related Posts

Meet SymTorch: A PyTorch Library that Interprets Deep Studying Fashions into Human-Readable Equations

March 3, 2026

The right way to Design Advanced Deep Studying Tensor Pipelines Utilizing Einops with Imaginative and prescient, Consideration, and Multimodal Examples

February 10, 2026

Microsoft AI Proposes OrbitalBrain: Enabling Distributed Machine Studying in House with Inter-Satellite tv for pc Hyperlinks and Constellation-Conscious Useful resource Optimization Methods

February 9, 2026
Misa
Trending
Machine-Learning

Intelligo Launches the Intelligo Compliance MCP Server, Bringing Deterministic Due Diligence into Autonomous AI Funding Workflows

By Editorial TeamMarch 10, 20260

New infrastructure ensures each automated funding choice is backed by verified, auditable threat intelligence with…

Tanium Earns 5-Star Score in 2026 CRN® Accomplice Program Information for the fifth Consecutive 12 months

March 9, 2026

Smartria Launches AI-Powered SmartReview and SmartAssist, Showcases New Capabilities at Future Proof Citywide

March 9, 2026

Prezi Named AI-Pushed Device for Quicker Slide Creation by Professional Customers

March 9, 2026
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Our Picks

Intelligo Launches the Intelligo Compliance MCP Server, Bringing Deterministic Due Diligence into Autonomous AI Funding Workflows

March 10, 2026

Tanium Earns 5-Star Score in 2026 CRN® Accomplice Program Information for the fifth Consecutive 12 months

March 9, 2026

Smartria Launches AI-Powered SmartReview and SmartAssist, Showcases New Capabilities at Future Proof Citywide

March 9, 2026

Prezi Named AI-Pushed Device for Quicker Slide Creation by Professional Customers

March 9, 2026

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

The Ai Today™ Magazine is the first in the middle east that gives the latest developments and innovations in the field of AI. We provide in-depth articles and analysis on the latest research and technologies in AI, as well as interviews with experts and thought leaders in the field. In addition, The Ai Today™ Magazine provides a platform for researchers and practitioners to share their work and ideas with a wider audience, help readers stay informed and engaged with the latest developments in the field, and provide valuable insights and perspectives on the future of AI.

Our Picks

Intelligo Launches the Intelligo Compliance MCP Server, Bringing Deterministic Due Diligence into Autonomous AI Funding Workflows

March 10, 2026

Tanium Earns 5-Star Score in 2026 CRN® Accomplice Program Information for the fifth Consecutive 12 months

March 9, 2026

Smartria Launches AI-Powered SmartReview and SmartAssist, Showcases New Capabilities at Future Proof Citywide

March 9, 2026
Trending

Prezi Named AI-Pushed Device for Quicker Slide Creation by Professional Customers

March 9, 2026

Coredge Selects Lightbits to Energy AI Cloud Providers Infrastructure

March 9, 2026

Cloudcure Launches Companion App to Shut Medical Adherence Hole in Metabolic Well being

March 9, 2026
Facebook X (Twitter) Instagram YouTube LinkedIn TikTok
  • About Us
  • Advertising Solutions
  • Privacy Policy
  • Terms
  • Podcast
Copyright © The Ai Today™ , All right reserved.

Type above and press Enter to search. Press Esc to cancel.