Sci-Tech

With Quiet-STaR, language models learn to think before speaking

A Captivating Artwork Of An Ai Styled After Rod Transformed.jpeg

[ad_1]

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


Humans are gifted with the ability to reason: ā€œifā€ and ā€œwhyā€ and the ability to ā€œread between the linesā€ and infer unstated information are all critical to our problem-solving capabilities.Ā 

Up until now, AI models have, naturally, struggled in this area. But researchers from Stanford University and Notbad AI, Inc., have now revealed that they have taught AI models to think before they respond to prompts ā€” just as (most) people consider what to say before speaking.Ā 

The researchers have introduced Quiet-STaR ā€” an extension of the Self-Taught Reasoner (STaR) model ā€” which is trained on a wide corpus of internet data and learns to generate rationales at each token to explain future text and improve predictions.

Quiet-STaR was applied to Mistral 7B, showing improvements to zero-shot direct reasoning abilities on the CommonsenseQA question-answering challenge (from 36.3% base to 47.2%) and the GSM8K grade school math word problems dataset (from 5.9% base to 10.9%). And, these improvements consistently increased with the number of tokens used in the modelā€™s ā€œinternal thoughts.ā€

VB Event

The AI Impact Tour ā€“ Atlanta

Continuing our tour, weā€™re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.


Request an invite

ā€œQuiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way,ā€ the researchers write.Ā 

Where AI reasoning has so far come up short

Previous methods that have helped language models learn from their reasoning have been more hyper-focused and less generalized: AIs have been trained to solve individual tasks or predefined sets of tasks that rely on carefully curated datasets.Ā 

For instance, a pre-trained language model fine-tuned to output on human reasoning traces before answering multiple-choice questions outperformed an AI trained directly on answers, the Quiet-STaR developers pointed out. Other models, when provided with ā€œscaffolding,ā€ can generate chain-of-thought solutions without additional supervision. Further, researchers have ā€œforcedā€ models to use chain-of-thought reasoning by preventing them from answering unless completely confident.Ā 

ā€œHowever, once again, these approaches only work for a question-answer dataset,ā€ the Stanford University and Notbad AI, Inc., researchers contend.Ā 

STaR, particularly, proved that models could ā€œbootstrapā€ their reasoning abilities on question-answering datasets. They could sample rationales to attempt to answer questions, train on those rationales if they led to correct answers and repeat iteratively to solve more and more difficult problems.Ā 

However, the Quiet-STaR researchers point out, that training from curated datasets limits the ā€œscale and generalizabilityā€ of rationales. High-quality datasets will ā€œinherently only ever cover a subset of reasoning tasks.ā€

Inferring rationales from few-shot examples in question-answering is a ā€œhighly-constrained setting,ā€ the researchers assert. ā€œIdeally, a language model could instead learn to infer unstated rationales in arbitrary text.ā€

By extending STaR, ā€œwe allow the LM to learn from the diverse tasks present in the language. To our knowledge, this is the first work explicitly training LMs to reason generally from text, rather than on curated reasoning tasks or collections of reasoning tasks.ā€

ā€˜Quietlyā€™ thinking

The Stanford University and Notbad AI, Inc. researchers refer to their technique as Quiet-STaR because it applies STaR ā€œquietly.ā€Ā 

The method generates many inner thoughts in parallel, at every token, to explain future text before responding to a prompt (i.e., the process of ā€œthinkingā€). When the AI finally answers, it produces a mixture of predictions with and without rationales.Ā 

The REINFORCE algorithm was then applied; in reinforcement learning, this collects samples in an episode to update policy parameters as well as start-of-thought and end-of-thought embeddings. Researchers explain that this helps increase the likelihood that the AI will accurately predict future text. As part of this, the model also discards incorrect predictions.Ā 

ā€œBy iteratively optimizing these parameters, Quiet-STaR trains the model to generate more useful rationales throughout training,ā€ the researchers write.Ā 

Because their goal was generalist reasoning, they used a zero-shot prompt (ā€œLetā€™s think step by stepā€) without in-context examples. Quiet-STaR was applied to Mistral 7B using the web text datasets OpenWebMath and Colossal Clean Crawled Corpus.Ā 

ā€œQuiet-STaRā€¦ allows a model to think quietly at every token, with a distribution trained to be useful,ā€ researchers write.Ā 

They add that, ā€œby training on the rich spectrum of reasoning tasks implicit in diverse web text, rather than narrowly specializing for particular datasets, Quiet-STaR points the way to more robust and adaptable language models.ā€

Closing the gap between model and human reasoning capabilities

Notably, researchers created a parallel sampling algorithm that generates rationales from all tokens in a string. This allowed the tokens to ā€œpay attention to themselves,ā€ all preceding tokens with the same thought and the preceding text. This allows for ā€œcontinuations of all of the thoughts in parallel,ā€ and each inference call generates an additional token for all tokens.Ā 

Researchers introduced custom meta-tokens at the beginning and the end of each thought. <|startofthought|> and <|endofthought|> were initialized with the em dash, ā€ā€”ā€, which is often used to denote a pause.Ā 

ā€œIntuitively, the start thought tokens can be understood as putting the model into a ā€˜thinking mode,ā€™ā€ the researchers explain, ā€œand the end thought token can be understood as telling the model when itā€™s done thinking.ā€

The next step incorporated whatā€™s known as a ā€œmixing head,ā€ a ā€œshallowā€ multilayer perceptron. This helped researchers retrospectively determine how much to incorporate the next-token prediction from a given thought into the current next-token prediction.

Finally, researchers optimized parameters to increase the likelihood of more probable future text. Reinforcement techniques provide a ā€œlearning signalā€ to rationales based on their impact on future predictions. To help reduce variance, researchers also introduced a ā€œteacher forcingā€ trick, which ensures that neural networks stay as close as possible to ground truth sequences.Ā 

Ultimately, ā€œQuiet-STaR represents a step towards language models that can learn to reason in a general and scalable way,ā€ the researchers conclude. ā€œFuture work can build on these insights to further close the gap between language model and human-like reasoning capabilities.ā€

[ad_2]

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *