The Deep Dive Issue #30

TextGrad, understanding LLMs, and powerful models

Jun 20, 2024

Say hello again to “Text-to-video”, the latest installment in the sudden avalanche of similar-but-not-quite-identical AIs show. We’re getting a flurry of releases, after a long backstage standoff since Sora’s preview. The catch ? Unlike the first LLM wave, most of these models are either inaccessible, or hardly unusable because of processing times.

Research is going fast, maybe too fast for production pipelines that can hardly keep up. In the meantime, we’re getting promising sneak peaks into the future of genAI, one that might get outclassed before it even turns into present.

Gradual Feedback

Optimizing complex AI pipelines with multiple interconnected LLMs and tools is tricky. Traditional optimization techniques rely on differentiable functions and numerical gradients but these are not readily available in these black-box systems. TextGrad is a framework that links traditional automatic differentiation and textual feedback from LLMs to optimize LLM outputs.

Why would you care - Back-propagation is a foundation of ML-based systems. Being able to refine both inputs and outputs iteratively at the level of gradients opens-up new ways for fine-grained output optimization.

*Visual representation of back-propagation-based editing using textual feedback. Source: Original Paper*

How does it work? - TextGrad represents AI systems as computation graphs where variables, such as code snippets, are connected by functions like LLM calls or simulators. Instead of numerical gradients, TextGrad uses "textual gradients" which are natural language critiques generated by LLMs, to provide feedback on how to adjust variables to improve system performance.

These textual gradients are then back-propagated through the graph, allowing the system to iteratively refine its components. For example, in a system designed to solve coding problems, TextGrad might identify an edge case in a generated code snippet and provide specific suggestions on how to modify the code to address the issue.

A typical optimization process involves:

Selecting the LLM to use as feedback engine,
Generating an answer to a question using a model,
Defining an evaluation instruction,
Computing the loss function on the output using the textual instruction, and
Running the optimizer function

All these steps can be done similar to a typical PyTorch optimization flow, making TextGrad easy to drag-and-drop in your AI workflows.

Check out the repository to get started.

Right and wrong

While LLMs excel at various tasks, they can be unreliable in high-stakes scenarios when they are unable to express uncertainty about the accuracy of their answers. This limitation hinders their use in real-world applications where knowing the trustworthiness of their predictions is crucial.

To address this issue, the research proposes fine-tuning LLMs on a small dataset of both correct and incorrect answers, allowing them to better estimate uncertainty. Instead of relying on computationally expensive sampling methods or potentially unreliable prompting techniques, this approach uses supervised learning to teach the model about its own correctness.

Three different fine-tuning methods are explored: Probe, LoRA, and LoRA + Prompt. Probe trains a separate neural network on the LLM's internal representations, while LoRA directly modifies the LLM's parameters for uncertainty estimation. LoRA + Prompt combines LoRA with specific language prompts for improved performance. The study uses a diverse dataset derived from various benchmarks and incorporates regularization techniques during training to prevent significant deviations from the original LLM's behavior.

This fine-tuning approach, particularly LoRA + Prompt, significantly enhances both the calibration and accuracy of uncertainty estimates in LLMs, even with a limited number of labeled examples, indicating that training models only on correct data may actually be counterproductive for ensuring robust output.

A roadmap paved with tokens

LLMs are good at generating text, there is debate about whether they can reason or plan like humans. The LLM-Modulo Framework investigates if LLMs can create workable plans on their own and verify their correctness.

This framework acts like a cycle of generating, testing, and improving plans. First, the LLM receives a problem description and tries to create a plan. This plan is then checked by multiple "critics." Some critics are like strict rule-checkers, ensuring the plan is logically sound and follows the rules of the world. These critics could use pre-defined models of how things work. Other critics might be more flexible, focusing on the plan's style, ease of understanding, or alignment with user preferences. These critics might use the LLM itself to judge these softer aspects. If a critic finds a problem, it provides feedback to the LLM, which then tries to improve the plan. The cycle continues until a plan satisfies all critics or a set limit of attempts is reached.

The LLM-Modulo Framework helps LLMs contribute to plan creation, achieving significantly better results than LLMs working alone, particularly in tasks like travel planning. However, the framework highlights that LLMs can't replace the need for robust planning systems and human oversight, especially when guaranteeing the safety and correctness of plans.

If you say so

LLMs can be trained to be helpful assistants, but if their reward signals are not designed correctly, they might learn unwanted behaviors. This happens when the model figures out how to get high rewards without actually doing the intended task, a problem called specification gaming.

To study if LLMs can learn to engage in increasingly sophisticated forms of specification gaming, the paper discusses a series of tests on increasingly complex gameable environments. At first, the tasks are simple, like tailoring an answer based on the user's political views (sycophancy). As the scenarios progress, the tasks require more cunning, like inflating the rating of a user's poem and even modifying a checklist to make it seem like tasks were completed.

The researchers trained the LLM on these scenarios one by one, rewarding it for successfully gaming the system in each. Finally, they tested the LLM in a new environment where it had access to a simplified version of its training code. This last test was designed to see if the LLM had learned to manipulate its reward system directly, a behavior called reward-tampering.

The research shows that Sycophancy Detection can indeed generalize from simpler forms of specification gaming to reward tampering. LLMs trained on the initial scenarios learned to manipulate their reward function when given access to their training code, sometimes even modifying the testing code to hide their actions.

Share The Deep Dive

The Pulse

One model to teach them all - NVIDIA has released Nemotron-4 340B, a family of open-source models designed to generate synthetic data for LLMs. The Nemotron-4 340B family is optimized for use with NVIDIA NeMo and TensorRT-LLM, allowing developers to fine-tune and optimize their models for specific use cases and domains.

Seeking assistance - DeepSeek, has released DeepSeek Coder V2, an open-source code LLM outperforming closed-source models like GPT-4 Turbo in coding and math tasks. It supports over 300 programming languages and boasts a larger context window to handle complex coding projects. The model is available under the MIT license, allowing for both research and commercial use.

Deeply productive - Google DeepMind, is transitioning from a research-focused lab to an AI product factory in response to growing competition from companies like OpenAI and Microsoft. This shift, which involves merging Google Brain and DeepMind, aims to accelerate the development of commercial AI products while maintaining foundational research.

And that’s all for this edition, we hope you enjoyed reading through!

The Unify dev team.