Beyond static AI: MIT’s new framework lets models teach themselves

fiverr
Beyond static AI: MIT's new framework lets models teach themselves
Ledger


Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

Researchers at MIT have developed a framework called Self-Adapting Language Models (SEAL) that enables large language models (LLMs) to continuously learn and adapt by updating their own internal parameters. SEAL teaches an LLM to generate its own training data and update instructions, allowing it to permanently absorb new knowledge and learn new tasks.

This framework could be useful for enterprise applications, particularly for AI agents that operate in dynamic environments, where they must constantly process new information and adapt their behavior.

The challenge of adapting LLMs

While large language models have shown remarkable abilities, adapting them to specific tasks, integrating new information, or mastering novel reasoning skills remains a significant hurdle.

coinbase

Currently, when faced with a new task, LLMs typically learn from data “as-is” through methods like finetuning or in-context learning. However, the provided data is not always in an optimal format for the model to learn efficiently. Existing approaches don’t allow the model to develop its own strategies for best transforming and learning from new information.

“Many enterprise use cases demand more than just factual recall—they require deeper, persistent adaptation,” Jyo Pari, PhD student at MIT and co-author of the paper, told VentureBeat. “For example, a coding assistant might need to internalize a company’s specific software framework, or a customer-facing model might need to learn a user’s unique behavior or preferences over time.” 

In such cases, temporary retrieval falls short, and the knowledge needs to be “baked into” the model’s weights so that it influences all future responses. 

Creating self-adapting language models

“As a step towards scalable and efficient adaptation of language models, we propose equipping LLMs with the ability to generate their own training data and finetuning directives for using such data,” the MIT researchers state in their paper.

Overview of SEAL framework Source: arXiv

The researchers’ solution is SEAL, short for Self-Adapting Language Models. It uses a reinforcement learning (RL) algorithm to train an LLM to generate “self-edits”—natural-language instructions that specify how the model should update its own weights. These self-edits can restructure new information, create synthetic training examples, or even define the technical parameters for the learning process itself.

Intuitively, SEAL teaches a model how to create its own personalized study guide. Instead of just reading a new document (the raw data), the model learns to rewrite and reformat that information into a style it can more easily absorb and internalize. This process brings together several key areas of AI research, including synthetic data generation, reinforcement learning and test-time training (TTT).

The framework operates on a two-loop system. In an “inner loop,” the model uses a self-edit to perform a small, temporary update to its weights. In an “outer loop,” the system evaluates whether that update improved the model’s performance on a target task. If it did, the model receives a positive reward, reinforcing its ability to generate that kind of effective self-edit in the future. Over time, the LLM becomes an expert at teaching itself.

In their study, the researchers used a single model for the entire SEAL framework. However, they also note that this process can be decoupled into a “teacher-student” model. A specialized teacher model could be trained to generate effective self-edits for a separate student model, which would then be updated. This approach could allow for more specialized and efficient adaptation pipelines in enterprise settings.

SEAL in action

The researchers tested SEAL in two key domains: knowledge incorporation (the ability to permanently integrate new facts) and few-shot learning (the ability to generalize from a handful of examples).

SEAL in knowledge incorporation (source: arXiv)
SEAL in knowledge incorporation Source: arXiv

For knowledge incorporation, the goal was to see if the model could answer questions about a text passage without having access to the passage during questioning. Finetuning Llama-3.2-1B on the raw text provided only a marginal improvement over the base model. 

However, when the SEAL model created “self-edits” by generating several “implications” from a passage and was trained on this synthetic data, its accuracy jumped to 47%. Notably, this outperformed results from using synthetic data generated by the much larger GPT-4.1, suggesting the model learned to create superior training material for itself.

SEAL in few-shot learning (source: arXiv)
SEAL in few-shot learning Source: arXiv

For few-shot learning, the researchers tested SEAL on examples from the Abstract Reasoning Corpus (ARC), where the model must solve visual puzzles. In the self-edit phase, the model had to generate the entire adaptation strategy, including which data augmentations and tools to use and what learning rate to apply. 

SEAL achieved a 72.5% success rate, a dramatic improvement over the 20% rate achieved without RL training and the 0% rate of standard in-context learning.

SEAL (red line) continues to improve across RL cycles (source: arXiv)
SEAL (red line) continues to improve across RL cycles Source: arXiv

Implications for the enterprise

Some experts project that the supply of high-quality, human-generated training data could be exhausted in the coming years. Progress may soon depend on “a model’s capacity to generate its own high-utility training signal,” as the researchers put it. They add, “A natural next step is to meta-train a dedicated SEAL synthetic-data generator model that produces fresh pretraining corpora, allowing future models to scale and achieve greater data efficiency without relying on additional human text.”

For example, the researchers propose that an LLM could ingest complex documents like academic papers or financial reports and autonomously generate thousands of explanations and implications to deepen its understanding. 

“This iterative loop of self-expression and self-refinement could allow models to keep improving on rare or underrepresented topics even in the absence of additional external supervision,” the researchers explain.

This capability is especially promising for building AI agents. Agentic systems must incrementally acquire and retain knowledge as they interact with their environment. SEAL provides a mechanism for this. After an interaction, an agent could synthesize a self-edit to trigger a weight update, allowing it to internalize the lessons learned. This enables the agent to evolve over time, improve its performance based on experience, and reduce its reliance on static programming or repeated human guidance.

“SEAL demonstrates that large language models need not remain static after pretraining,” the researchers write. “By learning to generate their own synthetic self-edit data and to apply it through lightweight weight updates, they can autonomously incorporate new knowledge and adapt to novel tasks.”

Limitations of SEAL

That said, SEAL is not a universal solution. For example, it can suffer from “catastrophic forgetting,” where constant retraining cycles can result in the model learning its earlier knowledge.

“In our current implementation, we encourage a hybrid approach,” Pari said. “Enterprises should be selective about what knowledge is important enough to integrate permanently.” 

Factual and evolving data can remain in external memory through RAG, while long-lasting, behavior-shaping knowledge is better suited for weight-level updates via SEAL. 

“This kind of hybrid memory strategy ensures the right information is persistent without overwhelming the model or introducing unnecessary forgetting,” he said.

It is also worth noting that SEAL takes a non-trivial amount of time to tune the self-edit examples and train the model. This makes continuous, real-time editing infeasible in most production settings.

“We envision a more practical deployment model where the system collects data over a period—say, a few hours or a day—and then performs targeted self-edits during scheduled update intervals,” Pari said. “This approach allows enterprises to control the cost of adaptation while still benefiting from SEAL’s ability to internalize new knowledge.”



Source link

fiverr

Be the first to comment

Leave a Reply

Your email address will not be published.


*