AI / ML / LLM / Transformer Models Timeline and List

Viktor Garske @vemgar, Last update: Tue Dec 26 15:23:35 2023

This is a collection of important papers in the area of Large Language Models and Transformer Models. It focuses on recent development, especially from mid-2022 onwards, and in no way claims to be exhaustive. It is actively updated.

See also the related work section that covers surveys and other approaches and tools to keep an up-to-date overview of the models and their relationships.

Something missing or wrong? Do you want to give feedback? Feel free to email me!

Legend: Model Method Dataset Application / Analysis →: Origin (architecture, idea or model), ⇢ (dashed): (weaker) origin (e.g. code), ⇢ (dotted): Related work

Download PDF Download PNG

This graph is clickable!

🔥 Latest entries

Published	Name
2023-12-12	Phi-2
2023-12-11	Mixtral 8x7B
2023-12-06	Gemini
2023-11-23	Yi
2023-11-06	Whisper v3
2023-10-25	Zephyr
2023-10-19	DALL-E 3
2023-09-27	Mistral 7B
2023-09-11	Phi-1.5
2023-08-24	Code Llama

Curated List of Large Language Models and Models based on Transformers (Index)

This list contains common models, methods and analyses of Large Language Models (LLM) or other (Seq2Seq) models that use Transformers. The list is not exhaustive and mostly limited to causal models.

Anything missing? Feel free to email me! I am more than happy to update the list!

Name	Published	Paper Name / Blog Post Name
Alpaca	2023-03-13	Alpaca: A Strong, Replicable Instruction-Following Model
AmbiEnt	2023-04-27	We're Afraid Language Models Aren't Modeling Ambiguity
Attention / Transformers	2017-06-12	Attention Is All You Need
AudioLDM	2023-01-29	AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Baize	2023-04-03	Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data
Bard	2023-03-21	An important next step on our AI journey
BERT	2018-10-11	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BLIP-2	2023-01-30	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLOOM	2022-11-09	BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BloombergGPT	2023-03-30	BloombergGPT: A Large Language Model for Finance
BLOOMZ	2022-11-03	Crosslingual Generalization through Multitask Finetuning
ChatGPT	2022-11-30	Introducing ChatGPT
Chinchilla	2022-03-29	Training Compute-Optimal Large Language Models
CLIP	2021-02-26	Learning Transferable Visual Models From Natural Language Supervision
Code Llama	2023-08-24	Code Llama: Open Foundation Models for Code
CodeT5	2021-09-03	CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
CodeT5+	2023-05-20	CodeT5+: Open Code Large Language Models for Code Understanding and Generation
OpenAI Codex	2021-07-07	Evaluating Large Language Models Trained on Code
ControlNet	2023-02-10	Adding Conditional Control to Text-to-Image Diffusion Models
CoT	2022-01-28	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
DALL-E	2021-01
DALL-E 2	2022-04-13	Hierarchical Text-Conditional Image Generation with CLIP Latents
DALL-E 3	2023-10-19	Improving Image Generation with Better Captions
DeepFloyd IF	2023-04-28	Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images
Denoising Diffusion	2020-06-19	Denoising Diffusion Probabilistic Models
Dolly	2023-03-24	Hello Dolly: Democratizing the magic of ChatGPT with open models
databricks-dolly-15k	2023-04-12	Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
Dolly 2.0	2023-04-12	Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
DPO	2023-05-29	Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Falcon	2023-05-24
Flamingo	2022-04-29	Flamingo: a Visual Language Model for Few-Shot Learning
Flan-T5	2022-10-20	Scaling Instruction-Finetuned Language Models
Flan-UL2	2023-03-03	A New Open Source Flan 20B with UL2
FlashAttention	2022-05-27	FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
FTD	2019-11-06	Fast Transformer Decoding: One Write-Head is All You Need
Gemini	2023-12-06	Gemini: A Family of Highly Capable Multimodal Models
Generative Agents	2023-04-07	Generative Agents: Interactive Simulacra of Human Behavior
GPT	2018-06-11	Improving Language Understanding by Generative Pre-Training
GPT-J	2021-06-04	GPT-J-6B: 6B JAX-Based Transformer
GPT-JT	2022-11-29	Releasing GPT-JT powered by open-source AI
GPT-NeoX	2021-08
GPT-2	2019-02-14	Language Models are Unsupervised Multitask Learners
GPT-3	2020-05-28	Language Models are Few-Shot Learners
GPT-3.5	2022-03
GPT-4	2023-03-15	GPT-4 Technical Report
GPT4All	2023-03	GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
GPT4All-J	2023-04	GPT4All-J: An Apache-2 Licensed Assistant-Style Chatbot
Survey on ChatGPT and Beyond	2023-04-26	Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Helpful and Harmless	2022-04-12	Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Imagen	2022-05-22	Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
InstructGPT	2022-01-27	Training language models to follow instructions with human feedback
Koala	2023-04-03	Koala: A Dialogue Model for Academic Research
LAION-5B	2022-06	LAION-5B: An open large-scale dataset for training next generation image-text models
LaMDA	2021-05-18	LaMDA: our breakthrough conversation technology
Latent Diffusion Models	2021-12-20	High-Resolution Image Synthesis with Latent Diffusion Models
LLaMA	2023-02-27	LLaMA: Open and Efficient Foundation Language Models
Llama 2	2023-07-18	Llama 2: Open Foundation and Fine-Tuned Chat Models
LLaMA-Adapter	2023-03-28	LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
LSST	2019-04-23	Generating Long Sequences with Sparse Transformers
Longformer	2020-04-10	Longformer: The Long-Document Transformer
LoRA	2021-06-17	LoRA: Low-Rank Adaptation of Large Language Models
Megatron-LM	2019-09-17	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mesh Transformer JAX	2021-06-04	GPT-J-6B: 6B JAX-Based Transformer
MetaGPT	2023-08-01	MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MiniGPT-4	2023-04-17	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Mistral 7B	2023-09-27	Announcing Mistral 7B
Mixtral 8x7B	2023-12-11	Mixtral of experts
MLCopilot	2023-04-28	MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
MPT-30B	2023-06-22	MPT-30B: Raising the bar for open-source foundation models
MPT-7B Base	2023-05-05	Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-Chat	2023-05-05	Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-Instruct	2023-05-05	Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-StoryWriter-65k+	2023-05-05	Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MTF	2021-10-15	Multitask Prompted Training Enables Zero-Shot Task Generalization
NaturalSpeech 2	2023-04-18	Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
OASST1	2023-04-15	OpenAssistant Conversations - Democratizing Large Language Model Alignment
OpenAssistant	2023-04-15	OpenAssistant Conversations - Democratizing Large Language Model Alignment
OpenLLaMA	2023-05-02
OpenOrca	2023-06-29
OpenOrca-Preview1-13B	2023-07-13
Orca	2023-06-05	Orca: Progressive Learning from Complex Explanation Traces of GPT-4
PaLM	2022-04-04	Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
PaLM 2	2023-05-10	PaLM 2 Technical Report
Phi-1.5	2023-09-11	Textbooks Are All You Need II: phi-1.5 technical report
Phi-2	2023-12-12	Phi-2: The surprising power of small language models
The Pile	2020-12	The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Pythia	2023-04-03	Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
RAG	2020-05-22	Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
RedPajama-Data-1T	2023-04-17	RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens
RedPajama-INCITE-Base	2023-05-05	Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RedPajama-INCITE-Chat	2023-05-05	Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RedPajama-INCITE-Instruct	2023-05-05	Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RLHF	2022-04-04	Training language models to follow instructions with human feedback
ROOTS	2023-03-07	The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
RWKV	2023-05-22	RWKV: Reinventing RNNs for the Transformer Era
ScienceQA	2022-09-20	Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Stable Diffusion XL 0.9	2023-06-22	SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
SDXL 1.0	2023-07-26	Announcing SDXL 1.0
Self-Instruct	2022-12-20	Self-Instruct: Aligning Language Model with Self Generated Instructions
Sparks of AGI	2023-03-22	Sparks of Artificial General Intelligence: Early experiments with GPT-4
Stable Diffusion	2022-08-22
StableLM	2023-04-19	Stability AI Launches the First of its StableLM Suite of Language Models
StableVicuna	2023-04-28	Stability AI releases StableVicuna, the AI World's First Open Source RLHF LLM Chatbot
StarCoder	2023-05-04	StarCoder: A State-of-the-Art LLM for Code
StarCoderBase	2023-05-04	StarCoder: A State-of-the-Art LLM for Code
StarCoderData	2023-05-04	StarCoder: A State-of-the-Art LLM for Code
T5	2019-10-23	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TANGO	2023-04-24	Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
The Stack	2022-11-20	The Stack: 3 TB of permissively licensed source code
UL2	2022-05-10	UL2: Unifying Language Learning Paradigms
Vicuna	2023-03-30	Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
VideoLDM	2023-04-18	Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Whisper	2022-09-21	Robust Speech Recognition via Large-Scale Weak Supervision
Whisper v3	2023-11-06	New models and developer products announced at DevDay
XGen-7B	2023-06-28	Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length
Yi	2023-11-23
Zephyr	2023-10-25	Zephyr: Direct Distillation of LM Alignment

Related work

Xavier Amatriain presented a comprehensive survey about LLMs and Transformer models in a blog post and paper in January 2023. It included a chronological timeline and classification (family tree) of many important and famous models.
This timeline here, however, attempts to additionally model dependencies between methods, datasets and models and act as a living document that is frequently updated to reflect the current state of research. It also directs itself more towards causal rather than masked models.
Rishi Bommasani, Thomas Liao and Percy Liang proposed Ecosystem Graphs (paper) as a documentation framework to centralize knowledge about Foundation models. (via) They model datasets, models and applications as well as their technical and social dependencies in a knowledge graph.
This LLM / Transformer Models timeline here focuses on temporal relations (timeline) and their dependencies between specific methods and techniques like Attention, LDM or RLHF. As an exmample, it allows to visualize that StableVicuna (released April 2023) is built upon Vicuna but incorporates the RLHF method (published April 2022) that also enabled InstructGPT (presented in January 2022).
Jonathan Jeon maintains a GitHub repository with a tabular timeline that covers paper publications, press releases and other related news.
It differs from this timeline by the tabular form and contains general entries like news as there are no dependency connections between the entries.
Special domains:
- Tristan Behrens maintains a GitHub repository with research resources about music modelling and generation with deep learning dating back to 1959.