Close Menu
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

A Coding Implementation of Finish-to-Finish Mind Decoding from MEG Indicators Utilizing NeuralSet and Deep Studying for Predicting Linguistic Options

May 1, 2026

Coco Robotics Appoints Ralf Wenzel to Board of Administrators

April 30, 2026

PolyAI Selects Kong to Scale its API Infrastructure and Speed up AI Innovation

April 30, 2026
Facebook X (Twitter) Instagram
Smart Homez™
Facebook X (Twitter) Instagram Pinterest YouTube LinkedIn TikTok
SUBSCRIBE
  • Home
  • AI News
  • AI Startups
  • Deep Learning
  • Interviews
  • Machine-Learning
  • Robotics
Smart Homez™
Home»Deep Learning»A New AI Analysis from China Introduces GLM-130B: A Bilingual (English and Chinese language) Pre-Skilled Language Mannequin with 130B Parameters
Deep Learning

A New AI Analysis from China Introduces GLM-130B: A Bilingual (English and Chinese language) Pre-Skilled Language Mannequin with 130B Parameters

By November 3, 2023Updated:November 3, 2023No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Reddit WhatsApp Email
A New AI Analysis from China Introduces GLM-130B: A Bilingual (English and Chinese language) Pre-Skilled Language Mannequin with 130B Parameters
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


In latest occasions, the zero-shot and few-shot capabilities of Giant Language Fashions (LLMs) have elevated considerably, with these with over 100B parameters giving state-of-the-art efficiency on varied benchmarks. Such an development additionally presents a important problem with respect to LLMs, i.e., transparency. Very restricted information about these large-scale fashions and their coaching course of is out there to the general public, and releasing this info would facilitate the coaching of high-quality LLMs of this scale.

A gaggle of researchers from Tsinghua College and Zhipu.AI have launched GLM-130B, which is an open-source bilingual (English and Chinese language) pre-trained language mannequin with 130B parameters. The researchers on this paper have demonstrated the coaching strategy of the mannequin, together with the methods the method might be optimized, in an try to open-source a mannequin at par with GPT-3, having parameters within the scale of 100B. Moreover, the researchers have shared each the profitable and failed points of the coaching course of.

GLM-130B makes use of a bidirectional Basic Language Mannequin (GLM) as its base. The structure makes use of autoregressive clean infilling as its coaching goal, which permits for a greater understanding of contexts as in comparison with GPT-style fashions. GLM-130B is ready to outperform each GPT-3 and PaLM 540B on zero-shot LAMBADA by attaining a zero-shot accuracy of 80.2%.

The authors of this paper experimented with completely different Layer Normalization (LN) strategies with a purpose to stabilize the coaching strategy of GLM-130B. Current practices reminiscent of Pre-LN, Put up-LN, and Sandwich-LN had been ineffective, however Put up-LN initialized with DeepNorm confirmed promising outcomes. The pre-training information of the mannequin consists of greater than 2TB of English and Chinese language textual content corpora extracted from on-line boards, encyclopedias, and so on., to type a well-balanced dataset.

As talked about earlier, GLM-130B achieves a report accuracy on the LAMBADA dataset. On the Pile check set, which consists of a collection of benchmarks for language modelling, the GLM mannequin’s efficiency was at par with GPT-3 and Jurassic-1 fashions. The mannequin additionally performs nicely on the MMLU benchmark, with its few-shot efficiency nearly as good as GPT-3. 

Moreover, on the BIG-bench benchmark, GLM-130B was in a position to outperform each GPT-3 and PaLM in zero-shot settings. Despite the fact that the mannequin gave vital performances, the researchers seen that its efficiency progress with respect to few-shot samples is just not as nice as GPT-3’s. They hypothesize that it is because of a number of causes, such because the mannequin’s bidirectional nature, the limitation of a dataset that’s at par with PaLM by way of high quality and variety, and so on.

The researchers additionally examined the zero-shot efficiency of the mannequin on Chinese language benchmarks. They concluded that GLM-130B not solely outperformed ERNIE Titan 3.0 throughout greater than ten duties but in addition carried out a minimum of 260% higher than the identical on two abstractive MRC datasets. This can be attributable to the truth that the pre-training goal of GLM included autoregressive clean infilling that’s just like abstractive MRC.

In conclusion, the GLM-130B is a robust, open-source, bilingual pre-trained language mannequin that performs on the degree of GPT-3 and PaLM throughout completely different benchmarks and even outperforms them in a number of the duties. Aside from its efficiency, what units this mannequin aside is the transparency of its improvement. The researchers have made the coaching strategy of the mannequin public, together with their experiences of each success and failure. This method displays their dedication to fostering open and inclusive analysis inside the area of LLMs.


Take a look at the Paper and Github. All Credit score For This Analysis Goes To the Researchers on This Undertaking. Additionally, don’t neglect to affix our 32k+ ML SubReddit, 40k+ Fb Group, Discord Channel, and E-mail E-newsletter, the place we share the newest AI analysis information, cool AI initiatives, and extra.

For those who like our work, you’ll love our publication..

We’re additionally on Telegram and WhatsApp.



I’m a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I’ve a eager curiosity in Information Science, particularly Neural Networks and their utility in varied areas.


🔥 Meet Retouch4me: A Household of Synthetic Intelligence-Powered Plug-Ins for Images Retouching

Related Posts

A Coding Implementation of Finish-to-Finish Mind Decoding from MEG Indicators Utilizing NeuralSet and Deep Studying for Predicting Linguistic Options

May 1, 2026

High 10 KV Cache Compression Methods for LLM Inference: Lowering Reminiscence Overhead Throughout Eviction, Quantization, and Low-Rank Strategies

April 29, 2026

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Threat Tiering, AI Provide Chain Safety, and Maturity Mannequin

April 24, 2026
Misa
Trending
Deep Learning

A Coding Implementation of Finish-to-Finish Mind Decoding from MEG Indicators Utilizing NeuralSet and Deep Studying for Predicting Linguistic Options

By Editorial TeamMay 1, 20260

EPOCHS = 15 decide = torch.optim.AdamW(mannequin.parameters(), lr=1e-3, weight_decay=1e-4) sched = torch.optim.lr_scheduler.CosineAnnealingLR(decide, T_max=EPOCHS) loss_fn = nn.MSELoss()…

Coco Robotics Appoints Ralf Wenzel to Board of Administrators

April 30, 2026

PolyAI Selects Kong to Scale its API Infrastructure and Speed up AI Innovation

April 30, 2026

ActiveState Curated Catalog Secures AI-Generated Code Throughout Any Improvement Surroundings

April 30, 2026
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Our Picks

A Coding Implementation of Finish-to-Finish Mind Decoding from MEG Indicators Utilizing NeuralSet and Deep Studying for Predicting Linguistic Options

May 1, 2026

Coco Robotics Appoints Ralf Wenzel to Board of Administrators

April 30, 2026

PolyAI Selects Kong to Scale its API Infrastructure and Speed up AI Innovation

April 30, 2026

ActiveState Curated Catalog Secures AI-Generated Code Throughout Any Improvement Surroundings

April 30, 2026

Subscribe to Updates

Get the latest creative news from SmartMag about art & design.

The Ai Today™ Magazine is the first in the middle east that gives the latest developments and innovations in the field of AI. We provide in-depth articles and analysis on the latest research and technologies in AI, as well as interviews with experts and thought leaders in the field. In addition, The Ai Today™ Magazine provides a platform for researchers and practitioners to share their work and ideas with a wider audience, help readers stay informed and engaged with the latest developments in the field, and provide valuable insights and perspectives on the future of AI.

Our Picks

A Coding Implementation of Finish-to-Finish Mind Decoding from MEG Indicators Utilizing NeuralSet and Deep Studying for Predicting Linguistic Options

May 1, 2026

Coco Robotics Appoints Ralf Wenzel to Board of Administrators

April 30, 2026

PolyAI Selects Kong to Scale its API Infrastructure and Speed up AI Innovation

April 30, 2026
Trending

ActiveState Curated Catalog Secures AI-Generated Code Throughout Any Improvement Surroundings

April 30, 2026

Convoso Declares ‘Convoso for Salesforce’ on Salesforce AgentExchange

April 30, 2026

CoreWeave SUNK Expands Capabilities to Carry AI Workloads On-line Sooner – Anyplace

April 30, 2026
Facebook X (Twitter) Instagram YouTube LinkedIn TikTok
  • About Us
  • Advertising Solutions
  • Privacy Policy
  • Terms
  • Podcast
Copyright © The Ai Today™ , All right reserved.

Type above and press Enter to search. Press Esc to cancel.