AI / ML / LLM / Transformer Models Timeline and List

Viktor Garske @vemgar, Last update: Mon May 29 20:03:28 2023

This is a collection of important papers in the area of Large Language Models and Transformer Models. It focuses on recent development, especially from mid-2022 onwards, and in no way claims to be exhaustive. It is actively updated.

See also the related work section that covers surveys and other approaches and tools to keep an up-to-date overview of the models and their relationships.

Something missing or wrong? Do you want to give feedback? Feel free to email me!

Legend: Model Method Dataset Application / Analysis →: Origin (architecture, idea or model), ⇢ (dashed): (weaker) origin (e.g. code), ⇢ (dotted): Related work
Download PDF Download PNG

This graph is clickable!

timeline (c) 2023 Viktor Garske <info@v-gar.de>, CC-BY-SA 4.0, https://ai.v-gar.de/, Last update: Mon May 29 20:02:53 UTC 2023 01/2017 01/2017 02/2017 02/2017 01/2017->02/2017 03/2017 03/2017 02/2017->03/2017 04/2017 04/2017 03/2017->04/2017 05/2017 05/2017 04/2017->05/2017 06/2017 06/2017 05/2017->06/2017 07/2017 07/2017 06/2017->07/2017 08/2017 08/2017 07/2017->08/2017 09/2017 09/2017 08/2017->09/2017 10/2017 10/2017 09/2017->10/2017 11/2017 11/2017 10/2017->11/2017 12/2017 12/2017 11/2017->12/2017 01/2018 01/2018 12/2017->01/2018 02/2018 02/2018 01/2018->02/2018 03/2018 03/2018 02/2018->03/2018 04/2018 04/2018 03/2018->04/2018 05/2018 05/2018 04/2018->05/2018 06/2018 06/2018 05/2018->06/2018 07/2018 07/2018 06/2018->07/2018 08/2018 08/2018 07/2018->08/2018 09/2018 09/2018 08/2018->09/2018 10/2018 10/2018 09/2018->10/2018 11/2018 11/2018 10/2018->11/2018 12/2018 12/2018 11/2018->12/2018 01/2019 01/2019 12/2018->01/2019 02/2019 02/2019 01/2019->02/2019 03/2019 03/2019 02/2019->03/2019 04/2019 04/2019 03/2019->04/2019 05/2019 05/2019 04/2019->05/2019 06/2019 06/2019 05/2019->06/2019 07/2019 07/2019 06/2019->07/2019 08/2019 08/2019 07/2019->08/2019 09/2019 09/2019 08/2019->09/2019 10/2019 10/2019 09/2019->10/2019 11/2019 11/2019 10/2019->11/2019 12/2019 12/2019 11/2019->12/2019 01/2020 01/2020 12/2019->01/2020 02/2020 02/2020 01/2020->02/2020 03/2020 03/2020 02/2020->03/2020 04/2020 04/2020 03/2020->04/2020 05/2020 05/2020 04/2020->05/2020 06/2020 06/2020 05/2020->06/2020 07/2020 07/2020 06/2020->07/2020 08/2020 08/2020 07/2020->08/2020 09/2020 09/2020 08/2020->09/2020 10/2020 10/2020 09/2020->10/2020 11/2020 11/2020 10/2020->11/2020 12/2020 12/2020 11/2020->12/2020 01/2021 01/2021 12/2020->01/2021 02/2021 02/2021 01/2021->02/2021 03/2021 03/2021 02/2021->03/2021 04/2021 04/2021 03/2021->04/2021 05/2021 05/2021 04/2021->05/2021 06/2021 06/2021 05/2021->06/2021 07/2021 07/2021 06/2021->07/2021 08/2021 08/2021 07/2021->08/2021 09/2021 09/2021 08/2021->09/2021 10/2021 10/2021 09/2021->10/2021 11/2021 11/2021 10/2021->11/2021 12/2021 12/2021 11/2021->12/2021 01/2022 01/2022 12/2021->01/2022 02/2022 02/2022 01/2022->02/2022 03/2022 03/2022 02/2022->03/2022 04/2022 04/2022 03/2022->04/2022 05/2022 05/2022 04/2022->05/2022 06/2022 06/2022 05/2022->06/2022 07/2022 07/2022 06/2022->07/2022 08/2022 08/2022 07/2022->08/2022 09/2022 09/2022 08/2022->09/2022 10/2022 10/2022 09/2022->10/2022 11/2022 11/2022 10/2022->11/2022 12/2022 12/2022 11/2022->12/2022 01/2023 01/2023 12/2022->01/2023 02/2023 02/2023 01/2023->02/2023 03/2023 03/2023 02/2023->03/2023 04/2023 04/2023 03/2023->04/2023 05/2023 05/2023 04/2023->05/2023 MegatronLm Megatron-LM GptNeox GPT-NeoX MegatronLm->GptNeox MeshTransformerJax Mesh Transformer JAX MegatronLm->MeshTransformerJax Starcoderbase StarCoderBase MegatronLm->Starcoderbase Gpt GPT Gpt2 GPT-2 Gpt->Gpt2 Gpt3 GPT-3 Gpt2->Gpt3 GptJ GPT-J Gpt2->GptJ Starcoder StarCoder Gpt2->Starcoder Gpt2->Starcoderbase Codex OpenAI Codex Gpt3->Codex Gpt4 GPT-4 Gpt3->Gpt4 Instructgpt InstructGPT Gpt3->Instructgpt Llama LLaMA Gpt3->Llama Dalle DALL-E Gpt3->Dalle Gpt35 GPT-3.5 Gpt3->Gpt35 Mtf MTF Gpt3->Mtf Codex->Starcoder Codex->Starcoderbase Ambient AmbiEnt Gpt4->Ambient Chatgpt ChatGPT Gpt4->Chatgpt SparksOfAgi Sparks of AGI Gpt4->SparksOfAgi Instructgpt->Gpt35 SelfInstruct Self-Instruct Instructgpt->SelfInstruct Dolly15k databricks-dolly-15k Instructgpt->Dolly15k LlamaAdapter LLaMA-Adapter Llama->LlamaAdapter Gpt4all GPT4All Llama->Gpt4all Koala Koala Llama->Koala Baize Baize Llama->Baize Oasst1Model OpenAssistant Llama->Oasst1Model Openllama OpenLLaMA Llama->Openllama RedpajamaInciteBase RedPajama-INCITE-Base Llama->RedpajamaInciteBase Alpaca Alpaca Llama->Alpaca RedpajamaData1t RedPajama-Data-1T Llama->RedpajamaData1t Gpt4allJ GPT4All-J Gpt4all->Gpt4allJ Pythia Pythia Dolly2 Dolly 2.0 Pythia->Dolly2 Pythia->Oasst1Model Stablelm StableLM Pythia->Stablelm Pythia->RedpajamaInciteBase GptNeox->Pythia GptNeox->Stablelm T5 T5 FlanT5 Flan-T5 T5->FlanT5 Codet5 CodeT5 T5->Codet5 Tango TANGO FlanT5->Tango Ul2 UL2 FlanUl2 Flan-UL2 Ul2->FlanUl2 Lamda LaMDA Bard Bard Lamda->Bard Dolly Dolly Dolly->Dolly2 Dalle2 DALL-E 2 Dalle->Dalle2 StableDiffusion Stable Diffusion Videoldm VideoLDM StableDiffusion->Videoldm Whisper Whisper Bloom BLOOM Bloomz BLOOMZ Bloom->Bloomz Bloomberggpt BloombergGPT Bloom->Bloomberggpt Palm PaLM Palm->Llama Palm2 PaLM 2 Palm->Palm2 Chinchilla Chinchilla Chinchilla->Llama Vicuna Vicuna Stablevicuna StableVicuna Vicuna->Stablevicuna Minigpt4 MiniGPT-4 Vicuna->Minigpt4 DeepfloydIf DeepFloyd IF Blip2 BLIP-2 Blip2->Minigpt4 Flamingo Flamingo Flamingo->Blip2 Naturalspeech2 NaturalSpeech 2 Gpt35->Alpaca GenerativeAgents Generative Agents Gpt35->GenerativeAgents Gpt35->Chatgpt Mlcopilot MLCopilot Gpt35->Mlcopilot GptJ->Gpt4allJ GptJ->Dolly GptJ->Stablelm GptJt GPT-JT GptJ->GptJt MeshTransformerJax->GptJ RedpajamaInciteInstruct RedPajama-INCITE-Instruct GptJt->RedpajamaInciteInstruct Bert BERT HarnessingPowerLlmsPractise Survey on ChatGPT and Beyond Bert->HarnessingPowerLlmsPractise Audioldm AudioLDM Audioldm->Tango Starcoderbase->Starcoder RedpajamaInciteChat RedPajama-INCITE-Chat RedpajamaInciteBase->RedpajamaInciteChat RedpajamaInciteBase->RedpajamaInciteInstruct Mpt7bBase MPT-7B Base Mpt7bStorywriter65k MPT-7B-StoryWriter-65k+ Mpt7bBase->Mpt7bStorywriter65k Mpt7bInstruct MPT-7B-Instruct Mpt7bBase->Mpt7bInstruct Mpt7bChat MPT-7B-Chat Mpt7bBase->Mpt7bChat Falcon Falcon Codet5plus CodeT5+ Codet5->Codet5plus Attention Attention / Transformers Attention->MegatronLm Attention->Gpt Attention->Llama Attention->T5 Attention->Ul2 Attention->Lamda Attention->Whisper Attention->Bloom Attention->Palm Attention->Bert Attention->Mpt7bBase Attention->Falcon DenoisingDiffusion Denoising Diffusion Attention->DenoisingDiffusion Rwkv RWKV Attention->Rwkv SelfInstruct->Alpaca DenoisingDiffusion->Dalle2 LatentDiffusionModels Latent Diffusion Models DenoisingDiffusion->LatentDiffusionModels Imagen Imagen DenoisingDiffusion->Imagen DenoisingDiffusion->Videoldm LatentDiffusionModels->StableDiffusion LatentDiffusionModels->Naturalspeech2 LatentDiffusionModels->Audioldm LatentDiffusionModels->Tango Controlnet ControlNet LatentDiffusionModels->Controlnet Rlhf RLHF Rlhf->Instructgpt Rlhf->Stablevicuna Clip CLIP Clip->Dalle Clip->Audioldm Laion5b LAION-5B Clip->Laion5b Mtf->Bloomz Lora LoRA Lora->Vicuna Lora->Baize Alpaca->Dolly Alpaca->Vicuna Alpaca->Mpt7bChat Imagen->DeepfloydIf Cot CoT Cot->GptJt Flashattention FlashAttention Flashattention->Starcoderbase Flashattention->Mpt7bBase Ftd FTD Ftd->Starcoderbase Dolly15k->Dolly2 Dolly15k->RedpajamaInciteChat Dolly15k->Mpt7bInstruct Roots ROOTS Roots->Bloom Pile The Pile Pile->Pythia Pile->Stablelm Pile->GptJ Scienceqa ScienceQA Scienceqa->LlamaAdapter Laion5b->StableDiffusion Oasst1 OASST1 Oasst1->Oasst1Model Oasst1->RedpajamaInciteChat RedpajamaData1t->Openllama RedpajamaData1t->RedpajamaInciteBase TheStack The Stack Starcoderdata StarCoderData TheStack->Starcoderdata Starcoderdata->Starcoder Starcoderdata->Starcoderbase HhRlhf Helpful and Harmless HhRlhf->Mpt7bInstruct HhRlhf->Mpt7bChat Chatgpt->HarnessingPowerLlmsPractise

🔥 Latest entries

Published Name
2023-05-24 Falcon
2023-05-22 RWKV (Receptance Weighted Key Value)
2023-05-20 CodeT5+
2023-05-10 PaLM 2 (Pathways Language Model 2)
2023-05-05 MPT-7B Base
2023-05-05 MPT-7B-Chat
2023-05-05 MPT-7B-Instruct
2023-05-05 MPT-7B-StoryWriter-65k+
2023-05-05 RedPajama-INCITE-Base
2023-05-05 RedPajama-INCITE-Chat

Curated List of Large Language Models and Models based on Transformers (Index)

This list contains common models, methods and analyses of Large Language Models (LLM) or other (Seq2Seq) models that use Transformers. The list is not exhaustive and mostly limited to causal models.

Anything missing? Feel free to email me! I am more than happy to update the list!

Name Published Paper Name / Blog Post Name
Alpaca 2023-03-13 Alpaca: A Strong, Replicable Instruction-Following Model
AmbiEnt 2023-04-27 We're Afraid Language Models Aren't Modeling Ambiguity
Attention / Transformers 2017-06-12 Attention Is All You Need
AudioLDM 2023-01-29 AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Baize 2023-04-03 Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data
Bard 2023-03-21 An important next step on our AI journey
BERT 2018-10-11 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BLIP-2 2023-01-30 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLOOM 2022-11-09 BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BloombergGPT 2023-03-30 BloombergGPT: A Large Language Model for Finance
BLOOMZ 2022-11-03 Crosslingual Generalization through Multitask Finetuning
ChatGPT 2022-11-30 Introducing ChatGPT
Chinchilla 2022-03-29 Training Compute-Optimal Large Language Models
CLIP 2021-02-26 Learning Transferable Visual Models From Natural Language Supervision
CodeT5 2021-09-03 CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
CodeT5+ 2023-05-20 CodeT5+: Open Code Large Language Models for Code Understanding and Generation
OpenAI Codex 2021-07-07 Evaluating Large Language Models Trained on Code
ControlNet 2023-02-10 Adding Conditional Control to Text-to-Image Diffusion Models
CoT 2022-01-28 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
DALL-E 2021-01
DALL-E 2 2022-04-13 Hierarchical Text-Conditional Image Generation with CLIP Latents
DeepFloyd IF 2023-04-28 Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images
Denoising Diffusion 2020-06-19 Denoising Diffusion Probabilistic Models
Dolly 2023-03-24 Hello Dolly: Democratizing the magic of ChatGPT with open models
databricks-dolly-15k 2023-04-12 Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
Dolly 2.0 2023-04-12 Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
Falcon 2023-05-24
Flamingo 2022-04-29 Flamingo: a Visual Language Model for Few-Shot Learning
Flan-T5 2022-10-20 Scaling Instruction-Finetuned Language Models
Flan-UL2 2023-03-03 A New Open Source Flan 20B with UL2
FlashAttention 2022-05-27 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
FTD 2019-11-06 Fast Transformer Decoding: One Write-Head is All You Need
Generative Agents 2023-04-07 Generative Agents: Interactive Simulacra of Human Behavior
GPT 2018-06-11 Improving Language Understanding by Generative Pre-Training
GPT-J 2021-06-04 GPT-J-6B: 6B JAX-Based Transformer
GPT-JT 2022-11-29 Releasing GPT-JT powered by open-source AI
GPT-NeoX 2021-08
GPT-2 2019-02-14 Language Models are Unsupervised Multitask Learners
GPT-3 2020-05-28 Language Models are Few-Shot Learners
GPT-3.5 2022-03
GPT-4 2023-03-15 GPT-4 Technical Report
GPT4All 2023-03 GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
GPT4All-J 2023-04 GPT4All-J: An Apache-2 Licensed Assistant-Style Chatbot
Survey on ChatGPT and Beyond 2023-04-26 Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Helpful and Harmless 2022-04-12 Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Imagen 2022-05-22 Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
InstructGPT 2022-01-27 Training language models to follow instructions with human feedback
Koala 2023-04-03 Koala: A Dialogue Model for Academic Research
LAION-5B 2022-06 LAION-5B: An open large-scale dataset for training next generation image-text models
LaMDA 2021-05-18 LaMDA: our breakthrough conversation technology
Latent Diffusion Models 2021-12-20 High-Resolution Image Synthesis with Latent Diffusion Models
LLaMA 2023-02-27 LLaMA: Open and Efficient Foundation Language Models
LLaMA-Adapter 2023-03-28 LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
LoRA 2021-06-17 LoRA: Low-Rank Adaptation of Large Language Models
Megatron-LM 2019-09-17 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mesh Transformer JAX 2021-06-04 GPT-J-6B: 6B JAX-Based Transformer
MiniGPT-4 2023-04-17 MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
MLCopilot 2023-04-28 MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
MPT-7B Base 2023-05-05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-Chat 2023-05-05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-Instruct 2023-05-05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-StoryWriter-65k+ 2023-05-05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MTF 2021-10-15 Multitask Prompted Training Enables Zero-Shot Task Generalization
NaturalSpeech 2 2023-04-18 Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
OASST1 2023-04-15 OpenAssistant Conversations - Democratizing Large Language Model Alignment
OpenAssistant 2023-04-15 OpenAssistant Conversations - Democratizing Large Language Model Alignment
OpenLLaMA 2023-05-02
PaLM 2022-04-04 Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
PaLM 2 2023-05-10 PaLM 2 Technical Report
The Pile 2020-12 The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Pythia 2023-04-03 Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
RedPajama-Data-1T 2023-04-17 RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens
RedPajama-INCITE-Base 2023-05-05 Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RedPajama-INCITE-Chat 2023-05-05 Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RedPajama-INCITE-Instruct 2023-05-05 Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RLHF 2022-04-04 Training language models to follow instructions with human feedback
ROOTS 2023-03-07 The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
RWKV 2023-05-22 RWKV: Reinventing RNNs for the Transformer Era
ScienceQA 2022-09-20 Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Self-Instruct 2022-12-20 Self-Instruct: Aligning Language Model with Self Generated Instructions
Sparks of AGI 2023-03-22 Sparks of Artificial General Intelligence: Early experiments with GPT-4
Stable Diffusion 2022-08-22
StableLM 2023-04-19 Stability AI Launches the First of its StableLM Suite of Language Models
StableVicuna 2023-04-28 Stability AI releases StableVicuna, the AI World's First Open Source RLHF LLM Chatbot
StarCoder 2023-05-04 StarCoder: A State-of-the-Art LLM for Code
StarCoderBase 2023-05-04 StarCoder: A State-of-the-Art LLM for Code
StarCoderData 2023-05-04 StarCoder: A State-of-the-Art LLM for Code
T5 2019-10-23 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TANGO 2023-04-24 Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
The Stack 2022-11-20 The Stack: 3 TB of permissively licensed source code
UL2 2022-05-10 UL2: Unifying Language Learning Paradigms
Vicuna 2023-03-30 Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
VideoLDM 2023-04-18 Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Whisper 2022-09-21 Robust Speech Recognition via Large-Scale Weak Supervision

Related work