How one can Construct Reminiscence-Environment friendly Transformers with xFormers Utilizing Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Consideration

print("n" + "="*70 + "n4. Variable-length packed batch — no padding wasten" + "="*70)
seqlens = [37, 120, 8, 200]
whole = sum(seqlens)
H, Okay = 8, 64
q = torch.randn(1, whole, H, Okay, gadget=gadget, dtype=torch.float16)
ok = torch.randn(1, whole, H, Okay, gadget=gadget, dtype=torch.float16)
v = torch.randn(1, whole, H, Okay, gadget=gadget, dtype=torch.float16)
attempt:
   bias = ab.BlockDiagonalMask.from_seqlens(seqlens)
   out_packed = xops.memory_efficient_attention(q, ok, v, attn_bias=bias)
   s0 = seqlens[0]
   ref0 = vanilla_attention(q[:, :s0], ok[:, :s0], v[:, :s0]).half()
   print("packed form         :", tuple(out_packed.form), "(all", whole, "tokens, no pad)")
   print("segment-0 max diff   : {:.2e}".format((out_packed[:, :s0] - ref0).abs().max().merchandise()))
   cbias = ab.BlockDiagonalCausalMask.from_seqlens(seqlens)
   _ = xops.memory_efficient_attention(q, ok, v, attn_bias=cbias)
   print("-> additionally did a packed CAUSAL move. That is how vLLM-style engines")
   print("   batch requests of various lengths with zero padding overhead.")
   splits = bias.break up(out_packed)
   print("recovered segments   :", [tuple(t.shape) for t in splits])
besides Exception as e:
   print("BlockDiagonalMask path skipped on this model/backend:", repr(e))
print("n" + "="*70 + "n5. Grouped-query consideration (5-D BMGHK format)n" + "="*70)
B, M, Okay = 2, 256, 64
n_q_heads, n_kv_heads = 8, 2
G, Hq = n_kv_heads, n_q_heads // n_kv_heads
attempt:
   qg = torch.randn(B, M, G, Hq, Okay, gadget=gadget, dtype=torch.float16)
   kg = torch.randn(B, M, G, 1,  Okay, gadget=gadget, dtype=torch.float16)
   vg = torch.randn(B, M, G, 1,  Okay, gadget=gadget, dtype=torch.float16)
   out_gqa = xops.memory_efficient_attention(qg, kg, vg)
   print("GQA output form     :", tuple(out_gqa.form), "= [B, M, G, Hq, K]")
   print(f"-> {n_q_heads} question heads, solely {n_kv_heads} KV heads: smaller KV-cache,")
   print("   which is precisely what Llama-/Mistral-class fashions use at inference.")
besides Exception as e:
   print("GQA 5-D path skipped on this model/backend:", repr(e))

Supply hyperlink

What's Hot

Cyabra Launches AI Agent That Robotically Investigates Coordinated On-line Exercise and Delivers a Verdict

Pythian CTO Paul Lewis Featured in CIOReview on Why AI Adoption Is the Metric That Issues

Barndoor unveils AgentProfile, ending borrowed identities and blind prices for each agent

How one can Construct Reminiscence-Environment friendly Transformers with xFormers Utilizing Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Consideration

A Coding Implementation on MONAI for Finish-to-Finish 3D Spleen Segmentation Utilizing UNet on Medical CT Volumes

Pace Up Transformer Coaching Utilizing NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

Nous Analysis Proposes Lighthouse Consideration: A Coaching-Solely Choice-Based mostly Hierarchical Consideration That Delivers 1.4–1.7× Pretraining Speedup at Lengthy Context

Cyabra Launches AI Agent That Robotically Investigates Coordinated On-line Exercise and Delivers a Verdict

Pythian CTO Paul Lewis Featured in CIOReview on Why AI Adoption Is the Metric That Issues

Barndoor unveils AgentProfile, ending borrowed identities and blind prices for each agent

Sygnia Penetration Check Reveals Essential “vibe coded” Vulnerabilities Inside Claude-Primarily based Software

Cyabra Launches AI Agent That Robotically Investigates Coordinated On-line Exercise and Delivers a Verdict

Pythian CTO Paul Lewis Featured in CIOReview on Why AI Adoption Is the Metric That Issues

Barndoor unveils AgentProfile, ending borrowed identities and blind prices for each agent

Sygnia Penetration Check Reveals Essential “vibe coded” Vulnerabilities Inside Claude-Primarily based Software

Our Picks

Cyabra Launches AI Agent That Robotically Investigates Coordinated On-line Exercise and Delivers a Verdict

Pythian CTO Paul Lewis Featured in CIOReview on Why AI Adoption Is the Metric That Issues

Barndoor unveils AgentProfile, ending borrowed identities and blind prices for each agent

Trending

Sygnia Penetration Check Reveals Essential “vibe coded” Vulnerabilities Inside Claude-Primarily based Software

QXL Introduces Quantum Infrastructure Consulting for Tutorial, Authorities and Business Organizations

Sunvega Launches AI-Native Workspace for Spatial Design and Manufacturing

Subscribe to Updates

What's Hot

How one can Construct Reminiscence-Environment friendly Transformers with xFormers Utilizing Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Consideration

Related Posts