Close Menu
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

Zafin Launches AIOS, an Finish-to-Finish Platform to Orchestrate and Govern Agentic Work

June 24, 2026

Coaade launching Native, Privateness-First AI Coding instruments with no token limits; Pronounces Upcoming Launch of Code 2.0 mannequin

June 24, 2026

Obligra Launches Confirm, a System of Report for AI-Assisted Choices

June 23, 2026
Facebook X (Twitter) Instagram
Smart Homez™
Facebook X (Twitter) Instagram Pinterest YouTube LinkedIn TikTok
SUBSCRIBE
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics
Smart Homez™
Home»Deep Learning»Pace Up Transformer Coaching Utilizing NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp
Deep Learning

Pace Up Transformer Coaching Utilizing NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

Editorial TeamBy Editorial TeamJune 2, 2026Updated:June 2, 2026No Comments2 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Reddit WhatsApp Email
Pace Up Transformer Coaching Utilizing NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


print("n### SECTION D: end-to-end Transformer (vanilla fp32 vs Apex fused + AMP) ###")
VOCAB, D, NHEAD, LAYERS, SEQ, BATCH, STEPS = 2000, 256, 4, 4, 128, 32, 60
class Block(torch.nn.Module):
   def __init__(self, d, nhead, norm_cls):
       tremendous().__init__()
       self.attn = torch.nn.MultiheadAttention(d, nhead, batch_first=True)
       self.ff = torch.nn.Sequential(torch.nn.Linear(d, 4 * d), torch.nn.GELU(),
                                     torch.nn.Linear(4 * d, d))
       self.n1, self.n2 = norm_cls(d), norm_cls(d)
   def ahead(self, x):
       h = self.n1(x); x = x + self.attn(h, h, h, need_weights=False)[0]
       return x + self.ff(self.n2(x))
class TinyTransformer(torch.nn.Module):
   def __init__(self, norm_cls):
       tremendous().__init__()
       self.emb = torch.nn.Embedding(VOCAB, D)
       self.blocks = torch.nn.ModuleList([Block(D, NHEAD, norm_cls) for _ in range(LAYERS)])
       self.norm = norm_cls(D)
       self.head = torch.nn.Linear(D, VOCAB)
   def ahead(self, idx):
       x = self.emb(idx)
       for b in self.blocks:
           x = b(x)
       return self.head(self.norm(x))
g = torch.Generator(system="cpu").manual_seed(0)
information = torch.randint(0, VOCAB, (BATCH, SEQ + 1), generator=g).to(DEV)
inp, tgt = information[:, :-1], information[:, 1:]
lossfn = torch.nn.CrossEntropyLoss()
def run_training(use_apex):
   torch.manual_seed(0)
   norm_cls = (FusedLayerNorm if (use_apex and HAS_FLN and APEX_OK) else torch.nn.LayerNorm)
   mannequin = TinyTransformer(norm_cls).to(DEV)
   if use_apex and HAS_AMP_C and APEX_OK:
       optimizer = FusedAdam(mannequin.parameters(), lr=3e-4)
   else:
       optimizer = torch.optim.AdamW(mannequin.parameters(), lr=3e-4)
   scaler = torch.amp.GradScaler("cuda", enabled=use_apex)
   def one_step():
       optimizer.zero_grad(set_to_none=True)
       with torch.amp.autocast("cuda", dtype=torch.float16, enabled=use_apex):
           logits = mannequin(inp)
           loss = lossfn(logits.reshape(-1, VOCAB), tgt.reshape(-1))
       scaler.scale(loss).backward()
       scaler.step(optimizer)
       scaler.replace()
       return loss
   for _ in vary(5):
       one_step()
   torch.cuda.synchronize()
   t0 = time.perf_counter()
   for _ in vary(STEPS):
       loss = one_step()
   torch.cuda.synchronize()
   dt = time.perf_counter() - t0
   return loss.merchandise(), (STEPS * BATCH * SEQ) / dt, dt
loss_v, tps_v, dt_v = run_training(use_apex=False)
print(f"  vanilla (fp32, nn.LayerNorm, AdamW)        : "
     f"{dt_v:5.2f}s  | {tps_v:9.0f} tok/s | closing loss {loss_v:.3f}")
if APEX_OK and (HAS_AMP_C or HAS_FLN):
   loss_a, tps_a, dt_a = run_training(use_apex=True)
   print(f"  apex   (fp16, FusedLayerNorm, FusedAdam)   : "
         f"{dt_a:5.2f}s  | {tps_a:9.0f} tok/s | closing loss {loss_a:.3f}")
   print(f"  ----> speedup: {tps_a / tps_v:0.2f}x throughput")
else:
   print("  apex path SKIPPED (no fused kernels constructed)")
print("n" + "=" * 78)
print("DONE. Key takeaways:")
print("  - FusedAdam/FusedLayerNorm/FusedRMSNorm are the still-relevant Apex items;")
print("    speedups develop with mannequin measurement & parameter rely (tiny demo understates it).")
print("  - apex.amp is deprecated -> want torch.amp.autocast + torch.amp.GradScaler.")
print("  - FusedAdam composes cleanly with native torch.amp (Part D).")
print("  - On actual workloads, additionally attempt a bigger mannequin and bf16 autocast (no scaler wanted).")
print("=" * 78)



Supply hyperlink

Editorial Team
  • Website

Related Posts

How one can Construct Reminiscence-Environment friendly Transformers with xFormers Utilizing Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Consideration

June 17, 2026

A Coding Implementation on MONAI for Finish-to-Finish 3D Spleen Segmentation Utilizing UNet on Medical CT Volumes

June 12, 2026

Nous Analysis Proposes Lighthouse Consideration: A Coaching-Solely Choice-Based mostly Hierarchical Consideration That Delivers 1.4–1.7× Pretraining Speedup at Lengthy Context

May 16, 2026
Misa
Trending
Interviews

Zafin Launches AIOS, an Finish-to-Finish Platform to Orchestrate and Govern Agentic Work

By Editorial TeamJune 24, 20260

AIOS helps regulated establishments orchestrate brokers, fashions, instruments and workflows from intent to ruled end…

Coaade launching Native, Privateness-First AI Coding instruments with no token limits; Pronounces Upcoming Launch of Code 2.0 mannequin

June 24, 2026

Obligra Launches Confirm, a System of Report for AI-Assisted Choices

June 23, 2026

INSPYR Options Launches INSPYR Velocity, Accelerating Enterprise Outcomes By means of the Energy of AI

June 23, 2026
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Our Picks

Zafin Launches AIOS, an Finish-to-Finish Platform to Orchestrate and Govern Agentic Work

June 24, 2026

Coaade launching Native, Privateness-First AI Coding instruments with no token limits; Pronounces Upcoming Launch of Code 2.0 mannequin

June 24, 2026

Obligra Launches Confirm, a System of Report for AI-Assisted Choices

June 23, 2026

INSPYR Options Launches INSPYR Velocity, Accelerating Enterprise Outcomes By means of the Energy of AI

June 23, 2026

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

The Ai Today™ Magazine is the first in the middle east that gives the latest developments and innovations in the field of AI. We provide in-depth articles and analysis on the latest research and technologies in AI, as well as interviews with experts and thought leaders in the field. In addition, The Ai Today™ Magazine provides a platform for researchers and practitioners to share their work and ideas with a wider audience, help readers stay informed and engaged with the latest developments in the field, and provide valuable insights and perspectives on the future of AI.

Our Picks

Zafin Launches AIOS, an Finish-to-Finish Platform to Orchestrate and Govern Agentic Work

June 24, 2026

Coaade launching Native, Privateness-First AI Coding instruments with no token limits; Pronounces Upcoming Launch of Code 2.0 mannequin

June 24, 2026

Obligra Launches Confirm, a System of Report for AI-Assisted Choices

June 23, 2026
Trending

INSPYR Options Launches INSPYR Velocity, Accelerating Enterprise Outcomes By means of the Energy of AI

June 23, 2026

Backblaze Publicizes 5-Yr Multi-Exabyte Information Storage Settlement with CoreWeave

June 23, 2026

Phrasly.AI Helps College students and Writers Keep Authentic within the AI Period

June 23, 2026
Facebook X (Twitter) Instagram YouTube LinkedIn TikTok
  • About Us
  • Advertising Solutions
  • Privacy Policy
  • Terms
  • Podcast
Copyright © The Ai Today™ , All right reserved.

Type above and press Enter to search. Press Esc to cancel.