Close Menu
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

Striding AI Launches with Plans to Construct Subsequent-Technology Robotic Basis Methods for Bodily AI Deployment

June 25, 2026

Aira Applied sciences Collaborates With Nokia to Supercharge RAN Automation

June 25, 2026

Coval Raises $28 Million Collection A to Outline Security and Reliability for Autonomous Voice Brokers

June 24, 2026
Facebook X (Twitter) Instagram
Smart Homez™
Facebook X (Twitter) Instagram Pinterest YouTube LinkedIn TikTok
SUBSCRIBE
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics
Smart Homez™
Home»Deep Learning»Pace Up Transformer Coaching Utilizing NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp
Deep Learning

Pace Up Transformer Coaching Utilizing NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

Editorial TeamBy Editorial TeamJune 2, 2026Updated:June 2, 2026No Comments2 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Reddit WhatsApp Email
Pace Up Transformer Coaching Utilizing NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


print("n### SECTION D: end-to-end Transformer (vanilla fp32 vs Apex fused + AMP) ###")
VOCAB, D, NHEAD, LAYERS, SEQ, BATCH, STEPS = 2000, 256, 4, 4, 128, 32, 60
class Block(torch.nn.Module):
   def __init__(self, d, nhead, norm_cls):
       tremendous().__init__()
       self.attn = torch.nn.MultiheadAttention(d, nhead, batch_first=True)
       self.ff = torch.nn.Sequential(torch.nn.Linear(d, 4 * d), torch.nn.GELU(),
                                     torch.nn.Linear(4 * d, d))
       self.n1, self.n2 = norm_cls(d), norm_cls(d)
   def ahead(self, x):
       h = self.n1(x); x = x + self.attn(h, h, h, need_weights=False)[0]
       return x + self.ff(self.n2(x))
class TinyTransformer(torch.nn.Module):
   def __init__(self, norm_cls):
       tremendous().__init__()
       self.emb = torch.nn.Embedding(VOCAB, D)
       self.blocks = torch.nn.ModuleList([Block(D, NHEAD, norm_cls) for _ in range(LAYERS)])
       self.norm = norm_cls(D)
       self.head = torch.nn.Linear(D, VOCAB)
   def ahead(self, idx):
       x = self.emb(idx)
       for b in self.blocks:
           x = b(x)
       return self.head(self.norm(x))
g = torch.Generator(system="cpu").manual_seed(0)
information = torch.randint(0, VOCAB, (BATCH, SEQ + 1), generator=g).to(DEV)
inp, tgt = information[:, :-1], information[:, 1:]
lossfn = torch.nn.CrossEntropyLoss()
def run_training(use_apex):
   torch.manual_seed(0)
   norm_cls = (FusedLayerNorm if (use_apex and HAS_FLN and APEX_OK) else torch.nn.LayerNorm)
   mannequin = TinyTransformer(norm_cls).to(DEV)
   if use_apex and HAS_AMP_C and APEX_OK:
       optimizer = FusedAdam(mannequin.parameters(), lr=3e-4)
   else:
       optimizer = torch.optim.AdamW(mannequin.parameters(), lr=3e-4)
   scaler = torch.amp.GradScaler("cuda", enabled=use_apex)
   def one_step():
       optimizer.zero_grad(set_to_none=True)
       with torch.amp.autocast("cuda", dtype=torch.float16, enabled=use_apex):
           logits = mannequin(inp)
           loss = lossfn(logits.reshape(-1, VOCAB), tgt.reshape(-1))
       scaler.scale(loss).backward()
       scaler.step(optimizer)
       scaler.replace()
       return loss
   for _ in vary(5):
       one_step()
   torch.cuda.synchronize()
   t0 = time.perf_counter()
   for _ in vary(STEPS):
       loss = one_step()
   torch.cuda.synchronize()
   dt = time.perf_counter() - t0
   return loss.merchandise(), (STEPS * BATCH * SEQ) / dt, dt
loss_v, tps_v, dt_v = run_training(use_apex=False)
print(f"  vanilla (fp32, nn.LayerNorm, AdamW)        : "
     f"{dt_v:5.2f}s  | {tps_v:9.0f} tok/s | closing loss {loss_v:.3f}")
if APEX_OK and (HAS_AMP_C or HAS_FLN):
   loss_a, tps_a, dt_a = run_training(use_apex=True)
   print(f"  apex   (fp16, FusedLayerNorm, FusedAdam)   : "
         f"{dt_a:5.2f}s  | {tps_a:9.0f} tok/s | closing loss {loss_a:.3f}")
   print(f"  ----> speedup: {tps_a / tps_v:0.2f}x throughput")
else:
   print("  apex path SKIPPED (no fused kernels constructed)")
print("n" + "=" * 78)
print("DONE. Key takeaways:")
print("  - FusedAdam/FusedLayerNorm/FusedRMSNorm are the still-relevant Apex items;")
print("    speedups develop with mannequin measurement & parameter rely (tiny demo understates it).")
print("  - apex.amp is deprecated -> want torch.amp.autocast + torch.amp.GradScaler.")
print("  - FusedAdam composes cleanly with native torch.amp (Part D).")
print("  - On actual workloads, additionally attempt a bigger mannequin and bf16 autocast (no scaler wanted).")
print("=" * 78)



Supply hyperlink

Editorial Team
  • Website

Related Posts

How one can Construct Reminiscence-Environment friendly Transformers with xFormers Utilizing Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Consideration

June 17, 2026

A Coding Implementation on MONAI for Finish-to-Finish 3D Spleen Segmentation Utilizing UNet on Medical CT Volumes

June 12, 2026

Nous Analysis Proposes Lighthouse Consideration: A Coaching-Solely Choice-Based mostly Hierarchical Consideration That Delivers 1.4–1.7× Pretraining Speedup at Lengthy Context

May 16, 2026
Misa
Trending
Interviews

Striding AI Launches with Plans to Construct Subsequent-Technology Robotic Basis Methods for Bodily AI Deployment

By Editorial TeamJune 25, 20260

Striding AI introduced that it’s growing a brand new era of robotic basis techniques designed…

Aira Applied sciences Collaborates With Nokia to Supercharge RAN Automation

June 25, 2026

Coval Raises $28 Million Collection A to Outline Security and Reliability for Autonomous Voice Brokers

June 24, 2026

Bain & Firm pronounces partnership with Google Cloud to allow accelerated and safe, enterprise-scale AI transformations

June 24, 2026
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Our Picks

Striding AI Launches with Plans to Construct Subsequent-Technology Robotic Basis Methods for Bodily AI Deployment

June 25, 2026

Aira Applied sciences Collaborates With Nokia to Supercharge RAN Automation

June 25, 2026

Coval Raises $28 Million Collection A to Outline Security and Reliability for Autonomous Voice Brokers

June 24, 2026

Bain & Firm pronounces partnership with Google Cloud to allow accelerated and safe, enterprise-scale AI transformations

June 24, 2026

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

The Ai Today™ Magazine is the first in the middle east that gives the latest developments and innovations in the field of AI. We provide in-depth articles and analysis on the latest research and technologies in AI, as well as interviews with experts and thought leaders in the field. In addition, The Ai Today™ Magazine provides a platform for researchers and practitioners to share their work and ideas with a wider audience, help readers stay informed and engaged with the latest developments in the field, and provide valuable insights and perspectives on the future of AI.

Our Picks

Striding AI Launches with Plans to Construct Subsequent-Technology Robotic Basis Methods for Bodily AI Deployment

June 25, 2026

Aira Applied sciences Collaborates With Nokia to Supercharge RAN Automation

June 25, 2026

Coval Raises $28 Million Collection A to Outline Security and Reliability for Autonomous Voice Brokers

June 24, 2026
Trending

Bain & Firm pronounces partnership with Google Cloud to allow accelerated and safe, enterprise-scale AI transformations

June 24, 2026

Ashwin Rangan Joins ReadyAI as Govt Advisor and Strategic Model Ambassador to Advance Accountable AI

June 24, 2026

Mizo Launches Finish-to-Finish Autonomous Decision Throughout Microsoft 365

June 24, 2026
Facebook X (Twitter) Instagram YouTube LinkedIn TikTok
  • About Us
  • Advertising Solutions
  • Privacy Policy
  • Terms
  • Podcast
Copyright © The Ai Today™ , All right reserved.

Type above and press Enter to search. Press Esc to cancel.