AI / ML / LLM / Transformer Models Timeline and List
This is a collection of important papers in the area of Large Language Models and Transformer Models. It focuses on recent development, especially from mid-2022 onwards, and in no way claims to be exhaustive. It is actively updated.
See also the related work section that covers surveys and other approaches and tools to keep an up-to-date overview of the models and their relationships.
Something missing or wrong? Do you want to give feedback? Feel free to email me!
This graph is clickable!
🔥 Latest entries
Published | Name |
---|---|
2023-12-12 | Phi-2 |
2023-12-11 | Mixtral 8x7B |
2023-12-06 | Gemini |
2023-11-23 | Yi |
2023-11-06 | Whisper v3 |
2023-10-25 | Zephyr |
2023-10-19 | DALL-E 3 |
2023-09-27 | Mistral 7B |
2023-09-11 | Phi-1.5 |
2023-08-24 | Code Llama |
Curated List of Large Language Models and Models based on Transformers (Index)
This list contains common models, methods and analyses of Large Language Models (LLM) or other (Seq2Seq) models that use Transformers. The list is not exhaustive and mostly limited to causal models.
Anything missing? Feel free to email me! I am more than happy to update the list!
Name | Published | Paper Name / Blog Post Name |
---|---|---|
Alpaca | 2023-03-13 | Alpaca: A Strong, Replicable Instruction-Following Model |
AmbiEnt | 2023-04-27 | We're Afraid Language Models Aren't Modeling Ambiguity |
Attention / Transformers | 2017-06-12 | Attention Is All You Need |
AudioLDM | 2023-01-29 | AudioLDM: Text-to-Audio Generation with Latent Diffusion Models |
Baize | 2023-04-03 | Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data |
Bard | 2023-03-21 | An important next step on our AI journey |
BERT | 2018-10-11 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
BLIP-2 | 2023-01-30 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models |
BLOOM | 2022-11-09 | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
BloombergGPT | 2023-03-30 | BloombergGPT: A Large Language Model for Finance |
BLOOMZ | 2022-11-03 | Crosslingual Generalization through Multitask Finetuning |
ChatGPT | 2022-11-30 | Introducing ChatGPT |
Chinchilla | 2022-03-29 | Training Compute-Optimal Large Language Models |
CLIP | 2021-02-26 | Learning Transferable Visual Models From Natural Language Supervision |
Code Llama | 2023-08-24 | Code Llama: Open Foundation Models for Code |
CodeT5 | 2021-09-03 | CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation |
CodeT5+ | 2023-05-20 | CodeT5+: Open Code Large Language Models for Code Understanding and Generation |
OpenAI Codex | 2021-07-07 | Evaluating Large Language Models Trained on Code |
ControlNet | 2023-02-10 | Adding Conditional Control to Text-to-Image Diffusion Models |
CoT | 2022-01-28 | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |
DALL-E | 2021-01 | |
DALL-E 2 | 2022-04-13 | Hierarchical Text-Conditional Image Generation with CLIP Latents |
DALL-E 3 | 2023-10-19 | Improving Image Generation with Better Captions |
DeepFloyd IF | 2023-04-28 | Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images |
Denoising Diffusion | 2020-06-19 | Denoising Diffusion Probabilistic Models |
Dolly | 2023-03-24 | Hello Dolly: Democratizing the magic of ChatGPT with open models |
databricks-dolly-15k | 2023-04-12 | Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM |
Dolly 2.0 | 2023-04-12 | Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM |
DPO | 2023-05-29 | Direct Preference Optimization: Your Language Model is Secretly a Reward Model |
Falcon | 2023-05-24 | |
Flamingo | 2022-04-29 | Flamingo: a Visual Language Model for Few-Shot Learning |
Flan-T5 | 2022-10-20 | Scaling Instruction-Finetuned Language Models |
Flan-UL2 | 2023-03-03 | A New Open Source Flan 20B with UL2 |
FlashAttention | 2022-05-27 | FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness |
FTD | 2019-11-06 | Fast Transformer Decoding: One Write-Head is All You Need |
Gemini | 2023-12-06 | Gemini: A Family of Highly Capable Multimodal Models |
Generative Agents | 2023-04-07 | Generative Agents: Interactive Simulacra of Human Behavior |
GPT | 2018-06-11 | Improving Language Understanding by Generative Pre-Training |
GPT-J | 2021-06-04 | GPT-J-6B: 6B JAX-Based Transformer |
GPT-JT | 2022-11-29 | Releasing GPT-JT powered by open-source AI |
GPT-NeoX | 2021-08 | |
GPT-2 | 2019-02-14 | Language Models are Unsupervised Multitask Learners |
GPT-3 | 2020-05-28 | Language Models are Few-Shot Learners |
GPT-3.5 | 2022-03 | |
GPT-4 | 2023-03-15 | GPT-4 Technical Report |
GPT4All | 2023-03 | GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo |
GPT4All-J | 2023-04 | GPT4All-J: An Apache-2 Licensed Assistant-Style Chatbot |
Survey on ChatGPT and Beyond | 2023-04-26 | Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond |
Helpful and Harmless | 2022-04-12 | Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback |
Imagen | 2022-05-22 | Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding |
InstructGPT | 2022-01-27 | Training language models to follow instructions with human feedback |
Koala | 2023-04-03 | Koala: A Dialogue Model for Academic Research |
LAION-5B | 2022-06 | LAION-5B: An open large-scale dataset for training next generation image-text models |
LaMDA | 2021-05-18 | LaMDA: our breakthrough conversation technology |
Latent Diffusion Models | 2021-12-20 | High-Resolution Image Synthesis with Latent Diffusion Models |
LLaMA | 2023-02-27 | LLaMA: Open and Efficient Foundation Language Models |
Llama 2 | 2023-07-18 | Llama 2: Open Foundation and Fine-Tuned Chat Models |
LLaMA-Adapter | 2023-03-28 | LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention |
LSST | 2019-04-23 | Generating Long Sequences with Sparse Transformers |
Longformer | 2020-04-10 | Longformer: The Long-Document Transformer |
LoRA | 2021-06-17 | LoRA: Low-Rank Adaptation of Large Language Models |
Megatron-LM | 2019-09-17 | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism |
Mesh Transformer JAX | 2021-06-04 | GPT-J-6B: 6B JAX-Based Transformer |
MetaGPT | 2023-08-01 | MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework |
MiniGPT-4 | 2023-04-17 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models |
Mistral 7B | 2023-09-27 | Announcing Mistral 7B |
Mixtral 8x7B | 2023-12-11 | Mixtral of experts |
MLCopilot | 2023-04-28 | MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks |
MPT-30B | 2023-06-22 | MPT-30B: Raising the bar for open-source foundation models |
MPT-7B Base | 2023-05-05 | Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs |
MPT-7B-Chat | 2023-05-05 | Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs |
MPT-7B-Instruct | 2023-05-05 | Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs |
MPT-7B-StoryWriter-65k+ | 2023-05-05 | Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs |
MTF | 2021-10-15 | Multitask Prompted Training Enables Zero-Shot Task Generalization |
NaturalSpeech 2 | 2023-04-18 | Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers |
OASST1 | 2023-04-15 | OpenAssistant Conversations - Democratizing Large Language Model Alignment |
OpenAssistant | 2023-04-15 | OpenAssistant Conversations - Democratizing Large Language Model Alignment |
OpenLLaMA | 2023-05-02 | |
OpenOrca | 2023-06-29 | |
OpenOrca-Preview1-13B | 2023-07-13 | |
Orca | 2023-06-05 | Orca: Progressive Learning from Complex Explanation Traces of GPT-4 |
PaLM | 2022-04-04 | Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance |
PaLM 2 | 2023-05-10 | PaLM 2 Technical Report |
Phi-1.5 | 2023-09-11 | Textbooks Are All You Need II: phi-1.5 technical report |
Phi-2 | 2023-12-12 | Phi-2: The surprising power of small language models |
The Pile | 2020-12 | The Pile: An 800GB Dataset of Diverse Text for Language Modeling |
Pythia | 2023-04-03 | Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling |
RAG | 2020-05-22 | Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks |
RedPajama-Data-1T | 2023-04-17 | RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens |
RedPajama-INCITE-Base | 2023-05-05 | Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models |
RedPajama-INCITE-Chat | 2023-05-05 | Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models |
RedPajama-INCITE-Instruct | 2023-05-05 | Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models |
RLHF | 2022-04-04 | Training language models to follow instructions with human feedback |
ROOTS | 2023-03-07 | The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset |
RWKV | 2023-05-22 | RWKV: Reinventing RNNs for the Transformer Era |
ScienceQA | 2022-09-20 | Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering |
Stable Diffusion XL 0.9 | 2023-06-22 | SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis |
SDXL 1.0 | 2023-07-26 | Announcing SDXL 1.0 |
Self-Instruct | 2022-12-20 | Self-Instruct: Aligning Language Model with Self Generated Instructions |
Sparks of AGI | 2023-03-22 | Sparks of Artificial General Intelligence: Early experiments with GPT-4 |
Stable Diffusion | 2022-08-22 | |
StableLM | 2023-04-19 | Stability AI Launches the First of its StableLM Suite of Language Models |
StableVicuna | 2023-04-28 | Stability AI releases StableVicuna, the AI World's First Open Source RLHF LLM Chatbot |
StarCoder | 2023-05-04 | StarCoder: A State-of-the-Art LLM for Code |
StarCoderBase | 2023-05-04 | StarCoder: A State-of-the-Art LLM for Code |
StarCoderData | 2023-05-04 | StarCoder: A State-of-the-Art LLM for Code |
T5 | 2019-10-23 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
TANGO | 2023-04-24 | Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model |
The Stack | 2022-11-20 | The Stack: 3 TB of permissively licensed source code |
UL2 | 2022-05-10 | UL2: Unifying Language Learning Paradigms |
Vicuna | 2023-03-30 | Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality |
VideoLDM | 2023-04-18 | Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models |
Whisper | 2022-09-21 | Robust Speech Recognition via Large-Scale Weak Supervision |
Whisper v3 | 2023-11-06 | New models and developer products announced at DevDay |
XGen-7B | 2023-06-28 | Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length |
Yi | 2023-11-23 | |
Zephyr | 2023-10-25 | Zephyr: Direct Distillation of LM Alignment |
Related work
-
Xavier Amatriain presented a comprehensive survey about LLMs and Transformer models in a
blog post
and paper in January 2023.
It included a chronological timeline and classification (family tree) of many important and famous models.
This timeline here, however, attempts to additionally model dependencies between methods, datasets and models and act as a living document that is frequently updated to reflect the current state of research. It also directs itself more towards causal rather than masked models. -
Rishi Bommasani, Thomas Liao and Percy Liang proposed
Ecosystem Graphs
(paper)
as a documentation framework to centralize knowledge about Foundation models.
(via)
They model datasets, models and applications as well as their technical and social dependencies
in a knowledge graph.
This LLM / Transformer Models timeline here focuses on temporal relations (timeline) and their dependencies between specific methods and techniques like Attention, LDM or RLHF. As an exmample, it allows to visualize that StableVicuna (released April 2023) is built upon Vicuna but incorporates the RLHF method (published April 2022) that also enabled InstructGPT (presented in January 2022). -
Jonathan Jeon maintains a GitHub repository
with a tabular timeline that covers paper publications, press releases and other related news.
It differs from this timeline by the tabular form and contains general entries like news as there are no dependency connections between the entries. -
Special domains:
- Tristan Behrens maintains a GitHub repository with research resources about music modelling and generation with deep learning dating back to 1959.