Everyone be waiting for new hardware from NVIDIA, AMD and squad; when Groq decided to just casually drop their latest engine, slapping the industry with efficient, near-instant inference shyly flirting with the 600TOPs at peak performance. This kind of impromptu breakthroughs pop-up just when you start to think stuff are evolving at a steady pace in the industry, and folks still manage to believe AI has hit a plateau somehow 🤷
Speaking of performance, we just took a dive into SliceGPT during our weekly paper reading session, a new cutting-edge research for LLM compression. Give it a look here if you’ve missed it!
Counting Epochs
Prompting models can yield widely fluctuating outcomes between models and configurations; and while various prompting techniques achieve decent results when attempting to steer LLM outputs, reproducibility across LLM workflows remains an issue. DataDreamer provides a standardized and adaptable interface for various workflows including synthetic data generation, fine-tuning, instruction-tuning, and alignment.
Why would you care? - DataDreamer helps setting-up systematic and reproducible LLM workflows, so if you're constantly tinkering with training pipelines this one could be a nice addition to your arsenal.
How does it work? - Different LLM workflows present their own set of challenges in terms of implementation and reproduction, such as prompt sensitivity, model degradation, and sharing exact data and hyperparameters. DataDreamer simplifies the implementation of these workflows through a single Python library that provides unique features like task orchestration and support for multi-GPU training.
DataDreamer workflows are built on top of the following building blocks:
In DataDreamer, code is organized into Sessions, using a Python context manager to automate the organization of files and folders created by various tasks within the session.
A fundamental task in a session is the Steps which transform input datasets to output datasets. Users can chain together multiple steps and use built-in functions for data manipulation.
Trainers enable fine-tuning, instruction tuning, reinforcement learning with human feedback (RLHF), distillation, classification, and embedding model training.
To optimize performance and resource usage, workflows can be cached at both disk and model levels, enabling the sharing of workflows and cached results. With caching enabled, users can resume interrupted workflows and selectively recompute portions affected by subsequent modifications. Finally, datasets and trained models can be published locally or on Hugging Face Hub, with automatic documentation, live demos, and relevant metadata.
Check out the repository to get started.
The Lab
Unique experiences
User interactions represent a rich source of behavioral data that can be leveraged via LLM finetuning. However, this data is often complex, sparse, and noisy, which hinders the process of learning relevant patterns.
USER-LLM is a two-stage approach for contextualizing LLMs by integrating user embeddings. In the first stage, a pretrained user encoder generates user embeddings from ID-based feature sequences. The embeddings are then concatenated and processed by a transformer decoder, and the output is projected back to the original feature spaces. In the second stage, output embeddings are integrated to the LLM via cross-attention along with intermediate text representations. Further, Perceiver units are incorporated into the projection layers to compress user embeddings into a more compact format, reducing the number of tokens needed to represent the user history and freeing up the LLM's context window.
USER-LLM allows language models to dynamically adjust to different user interaction contexts, and provides performance improvements compared to both non-LLM baselines and prompt-based personalization methods.
Metadaptation
Training supervised classification models for large-scale or real-time applications can be costly and time-consuming given the need to frequently retrain on new data.
The key idea behind HyperFast is to use a hypernetwork to predict the weights of a main network instead of going through the traditional time-consuming training process. To do this, HyperFast uses a series of initial transformation layers to project datasets of varying dimensionality to fixed-sized, feature-permutation invariant representations. It then uses random features and principal component analysis (PCA) to approximate a kernel between the support dataset and query samples. The resulting transformed data is then passed through the generated linear layers of the main network. The hypernetwork itself consists of multiple modules, each responsible for generating the weights for a particular layer of the main network.
HyperFast allows faster adaptation to new datasets and greater flexibility in handling datasets of varying dimensionality.
More cutting-edge research
Slice 'n dice - LoRETTA is a Parameter-Efficient Fine-Tuning (PEFT) framework for LLMs that uses Tensor-Train (TT) tensorization to reduce the number of trainable parameters required for fine-tuning. LoRETTA performs similarly to or outperforms other popular PEFT techniques while using up to 100x fewer parameters for models like LlaMA-2-7B.
Wholesome backprop - Data loss as the input passes through multiple layers is a common issue with modern DNNs. Programmable Gradient Information (PGI) addresses this by combining the returned gradients from different prediction heads through an integration network; allowing the main branch, used for inference, to retain complete information from learning predictions. PGI is used in YOLOv9 for improved information preservation when conducting image detection tasks.
Bound to each other - Image segmentation struggles with reasoning about relationships between multiple objects. DeiSAM segments objects in an image based on deictic prompts. It converts the image into an entity-relationship representation then into factual prompts, passes the prompts into an LLM to derive rules that condition entity interactions, and uses these rules to identify and segment the desired object. DeiSAM allows describing objects by their relations to other items, and outperforms comparable neural baselines.
The Pulse
iComplete - Apple is planning to introduce new AI tools aimed at developers and app testing processes. One of these initiatives includes an AI coding assistant that will rival GitHub Copilot. The specific details have not been disclosed yet but it's expected to be integrated into Xcode, which serves as Apple's primary development environment for MacOS, iOS, iPadOS, WatchOS, TvOS, and SwiftUI applications.
Chip buddies - Microsoft has chosen Intel to manufacture one of their upcoming home-grown processors. The partnership involves producing two distinct types of chips — a computer processor and an AI accelerator. This represents a significant milestone for Intel's ambition to expand its presence in the made-to-order chip sector, while allowing Microsoft to enhance its self-developed semiconductors.
The master painter learns to write - Stability AI has announced Stable Diffusion 3, described as their most advanced text-to-image model with significant enhancements such as improved performance with multi-subject prompts, increased image quality, and enhanced ability to accurately represent text within generated images.
And that’s all for this edition, we hope you enjoyed reading through!
The Unify Dev Team.