AI / ML / LLM / Transformer Models Timeline and List

Viktor Garske @vemgar, Last update: Tue Dec 26 15:23:35 2023

This is a collection of important papers in the area of Large Language Models and Transformer Models. It focuses on recent development, especially from mid-2022 onwards, and in no way claims to be exhaustive. It is actively updated.

See also the related work section that covers surveys and other approaches and tools to keep an up-to-date overview of the models and their relationships.

Something missing or wrong? Do you want to give feedback? Feel free to email me!

Legend: Model Method Dataset Application / Analysis →: Origin (architecture, idea or model), ⇢ (dashed): (weaker) origin (e.g. code), ⇢ (dotted): Related work
Download PDF Download PNG

This graph is clickable!

timeline (c) 2023 Viktor Garske <info@v-gar.de>, CC-BY-SA 4.0, https://ai.v-gar.de/, Last update: Tue Dec 26 15:22:55 UTC 2023 01/2017 01/2017 02/2017 02/2017 01/2017->02/2017 03/2017 03/2017 02/2017->03/2017 04/2017 04/2017 03/2017->04/2017 05/2017 05/2017 04/2017->05/2017 06/2017 06/2017 05/2017->06/2017 07/2017 07/2017 06/2017->07/2017 08/2017 08/2017 07/2017->08/2017 09/2017 09/2017 08/2017->09/2017 10/2017 10/2017 09/2017->10/2017 11/2017 11/2017 10/2017->11/2017 12/2017 12/2017 11/2017->12/2017 01/2018 01/2018 12/2017->01/2018 02/2018 02/2018 01/2018->02/2018 03/2018 03/2018 02/2018->03/2018 04/2018 04/2018 03/2018->04/2018 05/2018 05/2018 04/2018->05/2018 06/2018 06/2018 05/2018->06/2018 07/2018 07/2018 06/2018->07/2018 08/2018 08/2018 07/2018->08/2018 09/2018 09/2018 08/2018->09/2018 10/2018 10/2018 09/2018->10/2018 11/2018 11/2018 10/2018->11/2018 12/2018 12/2018 11/2018->12/2018 01/2019 01/2019 12/2018->01/2019 02/2019 02/2019 01/2019->02/2019 03/2019 03/2019 02/2019->03/2019 04/2019 04/2019 03/2019->04/2019 05/2019 05/2019 04/2019->05/2019 06/2019 06/2019 05/2019->06/2019 07/2019 07/2019 06/2019->07/2019 08/2019 08/2019 07/2019->08/2019 09/2019 09/2019 08/2019->09/2019 10/2019 10/2019 09/2019->10/2019 11/2019 11/2019 10/2019->11/2019 12/2019 12/2019 11/2019->12/2019 01/2020 01/2020 12/2019->01/2020 02/2020 02/2020 01/2020->02/2020 03/2020 03/2020 02/2020->03/2020 04/2020 04/2020 03/2020->04/2020 05/2020 05/2020 04/2020->05/2020 06/2020 06/2020 05/2020->06/2020 07/2020 07/2020 06/2020->07/2020 08/2020 08/2020 07/2020->08/2020 09/2020 09/2020 08/2020->09/2020 10/2020 10/2020 09/2020->10/2020 11/2020 11/2020 10/2020->11/2020 12/2020 12/2020 11/2020->12/2020 01/2021 01/2021 12/2020->01/2021 02/2021 02/2021 01/2021->02/2021 03/2021 03/2021 02/2021->03/2021 04/2021 04/2021 03/2021->04/2021 05/2021 05/2021 04/2021->05/2021 06/2021 06/2021 05/2021->06/2021 07/2021 07/2021 06/2021->07/2021 08/2021 08/2021 07/2021->08/2021 09/2021 09/2021 08/2021->09/2021 10/2021 10/2021 09/2021->10/2021 11/2021 11/2021 10/2021->11/2021 12/2021 12/2021 11/2021->12/2021 01/2022 01/2022 12/2021->01/2022 02/2022 02/2022 01/2022->02/2022 03/2022 03/2022 02/2022->03/2022 04/2022 04/2022 03/2022->04/2022 05/2022 05/2022 04/2022->05/2022 06/2022 06/2022 05/2022->06/2022 07/2022 07/2022 06/2022->07/2022 08/2022 08/2022 07/2022->08/2022 09/2022 09/2022 08/2022->09/2022 10/2022 10/2022 09/2022->10/2022 11/2022 11/2022 10/2022->11/2022 12/2022 12/2022 11/2022->12/2022 01/2023 01/2023 12/2022->01/2023 02/2023 02/2023 01/2023->02/2023 03/2023 03/2023 02/2023->03/2023 04/2023 04/2023 03/2023->04/2023 05/2023 05/2023 04/2023->05/2023 06/2023 06/2023 05/2023->06/2023 07/2023 07/2023 06/2023->07/2023 08/2023 08/2023 07/2023->08/2023 09/2023 09/2023 08/2023->09/2023 10/2023 10/2023 09/2023->10/2023 11/2023 11/2023 10/2023->11/2023 12/2023 12/2023 11/2023->12/2023 MegatronLm Megatron-LM GptNeox GPT-NeoX MegatronLm->GptNeox MeshTransformerJax Mesh Transformer JAX MegatronLm->MeshTransformerJax Starcoderbase StarCoderBase MegatronLm->Starcoderbase Gpt GPT Gpt2 GPT-2 Gpt->Gpt2 Gpt3 GPT-3 Gpt2->Gpt3 GptJ GPT-J Gpt2->GptJ Starcoder StarCoder Gpt2->Starcoder Gpt2->Starcoderbase Codex OpenAI Codex Gpt3->Codex Gpt4 GPT-4 Gpt3->Gpt4 Instructgpt InstructGPT Gpt3->Instructgpt Llama LLaMA Gpt3->Llama Dalle DALL-E Gpt3->Dalle Gpt35 GPT-3.5 Gpt3->Gpt35 Mtf MTF Gpt3->Mtf Codex->Starcoder Codex->Starcoderbase Ambient AmbiEnt Gpt4->Ambient Orca Orca Gpt4->Orca Chatgpt ChatGPT Gpt4->Chatgpt Metagpt MetaGPT Gpt4->Metagpt SparksOfAgi Sparks of AGI Gpt4->SparksOfAgi Instructgpt->Gpt35 SelfInstruct Self-Instruct Instructgpt->SelfInstruct Dolly15k databricks-dolly-15k Instructgpt->Dolly15k LlamaAdapter LLaMA-Adapter Llama->LlamaAdapter Gpt4all GPT4All Llama->Gpt4all Koala Koala Llama->Koala Baize Baize Llama->Baize Oasst1Model OpenAssistant Llama->Oasst1Model Openllama OpenLLaMA Llama->Openllama RedpajamaInciteBase RedPajama-INCITE-Base Llama->RedpajamaInciteBase Xgen7b XGen-7B Llama->Xgen7b Llama2 Llama 2 Llama->Llama2 Alpaca Alpaca Llama->Alpaca RedpajamaData1t RedPajama-Data-1T Llama->RedpajamaData1t Gpt4allJ GPT4All-J Gpt4all->Gpt4allJ Pythia Pythia Dolly2 Dolly 2.0 Pythia->Dolly2 Pythia->Oasst1Model Stablelm StableLM Pythia->Stablelm Pythia->RedpajamaInciteBase GptNeox->Pythia GptNeox->Stablelm T5 T5 FlanT5 Flan-T5 T5->FlanT5 Codet5 CodeT5 T5->Codet5 Tango TANGO FlanT5->Tango Ul2 UL2 FlanUl2 Flan-UL2 Ul2->FlanUl2 Lamda LaMDA Bard Bard Lamda->Bard Dolly Dolly Dolly->Dolly2 Dalle2 DALL-E 2 Dalle->Dalle2 Dalle3 DALL-E 3 Dalle2->Dalle3 StableDiffusion Stable Diffusion Sdxl09 Stable Diffusion XL 0.9 StableDiffusion->Sdxl09 Videoldm VideoLDM StableDiffusion->Videoldm Whisper Whisper WhisperV3 Whisper v3 Whisper->WhisperV3 Bloom BLOOM Bloomz BLOOMZ Bloom->Bloomz Bloomberggpt BloombergGPT Bloom->Bloomberggpt Palm PaLM Palm->Llama Palm2 PaLM 2 Palm->Palm2 Chinchilla Chinchilla Chinchilla->Llama Vicuna Vicuna Stablevicuna StableVicuna Vicuna->Stablevicuna Minigpt4 MiniGPT-4 Vicuna->Minigpt4 DeepfloydIf DeepFloyd IF Blip2 BLIP-2 Blip2->Minigpt4 Flamingo Flamingo Flamingo->Blip2 Naturalspeech2 NaturalSpeech 2 Gpt35->Alpaca GenerativeAgents Generative Agents Gpt35->GenerativeAgents Gpt35->Chatgpt Mlcopilot MLCopilot Gpt35->Mlcopilot GptJ->Gpt4allJ GptJ->Dolly GptJ->Stablelm GptJt GPT-JT GptJ->GptJt MeshTransformerJax->GptJ RedpajamaInciteInstruct RedPajama-INCITE-Instruct GptJt->RedpajamaInciteInstruct Bert BERT HarnessingPowerLlmsPractise Survey on ChatGPT and Beyond Bert->HarnessingPowerLlmsPractise Audioldm AudioLDM Audioldm->Tango Starcoderbase->Starcoder RedpajamaInciteChat RedPajama-INCITE-Chat RedpajamaInciteBase->RedpajamaInciteChat RedpajamaInciteBase->RedpajamaInciteInstruct Mpt7bBase MPT-7B Base Mpt7bStorywriter65k MPT-7B-StoryWriter-65k+ Mpt7bBase->Mpt7bStorywriter65k Mpt7bInstruct MPT-7B-Instruct Mpt7bBase->Mpt7bInstruct Mpt7bChat MPT-7B-Chat Mpt7bBase->Mpt7bChat Mpt30b MPT-30B Mpt7bBase->Mpt30b Falcon Falcon Codet5plus CodeT5+ Codet5->Codet5plus Sdxl10 SDXL 1.0 Sdxl09->Sdxl10 OpenorcaPreview113b OpenOrca-Preview1-13B Orca->OpenorcaPreview113b OpenorcaDataset OpenOrca Orca->OpenorcaDataset Yi Yi Llama2->Yi CodeLlama Code Llama Llama2->CodeLlama Mistral7b Mistral 7B Mixtral8x7b Mixtral 8x7B Mistral7b->Mixtral8x7b Zephyr Zephyr Mistral7b->Zephyr Gemini Gemini Phi15 Phi-1.5 Pi2 Phi-2 Phi15->Pi2 Attention Attention / Transformers Attention->MegatronLm Attention->Gpt Attention->Llama Attention->T5 Attention->Ul2 Attention->Lamda Attention->Whisper Attention->Bloom Attention->Palm Attention->Bert Attention->Mpt7bBase Attention->Falcon Attention->Mistral7b Attention->Gemini Attention->Phi15 DenoisingDiffusion Denoising Diffusion Attention->DenoisingDiffusion Rwkv RWKV Attention->Rwkv Longformer Longformer Attention->Longformer LongSeqSparseTransformers LSST Attention->LongSeqSparseTransformers Rag RAG Attention->Rag SelfInstruct->Alpaca DenoisingDiffusion->Dalle2 LatentDiffusionModels Latent Diffusion Models DenoisingDiffusion->LatentDiffusionModels Imagen Imagen DenoisingDiffusion->Imagen DenoisingDiffusion->Videoldm LatentDiffusionModels->StableDiffusion LatentDiffusionModels->Naturalspeech2 LatentDiffusionModels->Audioldm LatentDiffusionModels->Tango Controlnet ControlNet LatentDiffusionModels->Controlnet Rlhf RLHF Rlhf->Instructgpt Rlhf->Stablevicuna Dpo DPO Rlhf->Dpo Clip CLIP Clip->Dalle Clip->Audioldm Laion5b LAION-5B Clip->Laion5b Mtf->Bloomz Lora LoRA Lora->Vicuna Lora->Baize Alpaca->Dolly Alpaca->Vicuna Alpaca->Mpt7bChat Imagen->DeepfloydIf Cot CoT Cot->GptJt Flashattention FlashAttention Flashattention->Starcoderbase Flashattention->Mpt7bBase Flashattention->Mistral7b Ftd FTD Ftd->Starcoderbase Longformer->Mistral7b LongSeqSparseTransformers->Mistral7b Dolly15k->Dolly2 Dolly15k->RedpajamaInciteChat Dolly15k->Mpt7bInstruct Roots ROOTS Roots->Bloom Pile The Pile Pile->Pythia Pile->Stablelm Pile->GptJ Scienceqa ScienceQA Scienceqa->LlamaAdapter Laion5b->StableDiffusion Oasst1 OASST1 Oasst1->Oasst1Model Oasst1->RedpajamaInciteChat RedpajamaData1t->Openllama RedpajamaData1t->RedpajamaInciteBase RedpajamaData1t->Mpt30b TheStack The Stack Starcoderdata StarCoderData TheStack->Starcoderdata Starcoderdata->Starcoder Starcoderdata->Starcoderbase HhRlhf Helpful and Harmless HhRlhf->Mpt7bInstruct HhRlhf->Mpt7bChat OpenorcaDataset->OpenorcaPreview113b Chatgpt->HarnessingPowerLlmsPractise

🔥 Latest entries

Published Name
2023-12-12 Phi-2
2023-12-11 Mixtral 8x7B
2023-12-06 Gemini
2023-11-23 Yi
2023-11-06 Whisper v3
2023-10-25 Zephyr
2023-10-19 DALL-E 3
2023-09-27 Mistral 7B
2023-09-11 Phi-1.5
2023-08-24 Code Llama

Curated List of Large Language Models and Models based on Transformers (Index)

This list contains common models, methods and analyses of Large Language Models (LLM) or other (Seq2Seq) models that use Transformers. The list is not exhaustive and mostly limited to causal models.

Anything missing? Feel free to email me! I am more than happy to update the list!

Name Published Paper Name / Blog Post Name
Alpaca 2023-03-13 Alpaca: A Strong, Replicable Instruction-Following Model
AmbiEnt 2023-04-27 We're Afraid Language Models Aren't Modeling Ambiguity
Attention / Transformers 2017-06-12 Attention Is All You Need
AudioLDM 2023-01-29 AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Baize 2023-04-03 Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data
Bard 2023-03-21 An important next step on our AI journey
BERT 2018-10-11 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BLIP-2 2023-01-30 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLOOM 2022-11-09 BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BloombergGPT 2023-03-30 BloombergGPT: A Large Language Model for Finance
BLOOMZ 2022-11-03 Crosslingual Generalization through Multitask Finetuning
ChatGPT 2022-11-30 Introducing ChatGPT
Chinchilla 2022-03-29 Training Compute-Optimal Large Language Models
CLIP 2021-02-26 Learning Transferable Visual Models From Natural Language Supervision
Code Llama 2023-08-24 Code Llama: Open Foundation Models for Code
CodeT5 2021-09-03 CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
CodeT5+ 2023-05-20 CodeT5+: Open Code Large Language Models for Code Understanding and Generation
OpenAI Codex 2021-07-07 Evaluating Large Language Models Trained on Code
ControlNet 2023-02-10 Adding Conditional Control to Text-to-Image Diffusion Models
CoT 2022-01-28 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
DALL-E 2021-01
DALL-E 2 2022-04-13 Hierarchical Text-Conditional Image Generation with CLIP Latents
DALL-E 3 2023-10-19 Improving Image Generation with Better Captions
DeepFloyd IF 2023-04-28 Stability AI releases DeepFloyd IF, a powerful text-to-image model that can smartly integrate text into images
Denoising Diffusion 2020-06-19 Denoising Diffusion Probabilistic Models
Dolly 2023-03-24 Hello Dolly: Democratizing the magic of ChatGPT with open models
databricks-dolly-15k 2023-04-12 Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
Dolly 2.0 2023-04-12 Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
DPO 2023-05-29 Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Falcon 2023-05-24
Flamingo 2022-04-29 Flamingo: a Visual Language Model for Few-Shot Learning
Flan-T5 2022-10-20 Scaling Instruction-Finetuned Language Models
Flan-UL2 2023-03-03 A New Open Source Flan 20B with UL2
FlashAttention 2022-05-27 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
FTD 2019-11-06 Fast Transformer Decoding: One Write-Head is All You Need
Gemini 2023-12-06 Gemini: A Family of Highly Capable Multimodal Models
Generative Agents 2023-04-07 Generative Agents: Interactive Simulacra of Human Behavior
GPT 2018-06-11 Improving Language Understanding by Generative Pre-Training
GPT-J 2021-06-04 GPT-J-6B: 6B JAX-Based Transformer
GPT-JT 2022-11-29 Releasing GPT-JT powered by open-source AI
GPT-NeoX 2021-08
GPT-2 2019-02-14 Language Models are Unsupervised Multitask Learners
GPT-3 2020-05-28 Language Models are Few-Shot Learners
GPT-3.5 2022-03
GPT-4 2023-03-15 GPT-4 Technical Report
GPT4All 2023-03 GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
GPT4All-J 2023-04 GPT4All-J: An Apache-2 Licensed Assistant-Style Chatbot
Survey on ChatGPT and Beyond 2023-04-26 Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Helpful and Harmless 2022-04-12 Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Imagen 2022-05-22 Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
InstructGPT 2022-01-27 Training language models to follow instructions with human feedback
Koala 2023-04-03 Koala: A Dialogue Model for Academic Research
LAION-5B 2022-06 LAION-5B: An open large-scale dataset for training next generation image-text models
LaMDA 2021-05-18 LaMDA: our breakthrough conversation technology
Latent Diffusion Models 2021-12-20 High-Resolution Image Synthesis with Latent Diffusion Models
LLaMA 2023-02-27 LLaMA: Open and Efficient Foundation Language Models
Llama 2 2023-07-18 Llama 2: Open Foundation and Fine-Tuned Chat Models
LLaMA-Adapter 2023-03-28 LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
LSST 2019-04-23 Generating Long Sequences with Sparse Transformers
Longformer 2020-04-10 Longformer: The Long-Document Transformer
LoRA 2021-06-17 LoRA: Low-Rank Adaptation of Large Language Models
Megatron-LM 2019-09-17 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mesh Transformer JAX 2021-06-04 GPT-J-6B: 6B JAX-Based Transformer
MetaGPT 2023-08-01 MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MiniGPT-4 2023-04-17 MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Mistral 7B 2023-09-27 Announcing Mistral 7B
Mixtral 8x7B 2023-12-11 Mixtral of experts
MLCopilot 2023-04-28 MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
MPT-30B 2023-06-22 MPT-30B: Raising the bar for open-source foundation models
MPT-7B Base 2023-05-05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-Chat 2023-05-05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-Instruct 2023-05-05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-7B-StoryWriter-65k+ 2023-05-05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MTF 2021-10-15 Multitask Prompted Training Enables Zero-Shot Task Generalization
NaturalSpeech 2 2023-04-18 Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
OASST1 2023-04-15 OpenAssistant Conversations - Democratizing Large Language Model Alignment
OpenAssistant 2023-04-15 OpenAssistant Conversations - Democratizing Large Language Model Alignment
OpenLLaMA 2023-05-02
OpenOrca 2023-06-29
OpenOrca-Preview1-13B 2023-07-13
Orca 2023-06-05 Orca: Progressive Learning from Complex Explanation Traces of GPT-4
PaLM 2022-04-04 Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
PaLM 2 2023-05-10 PaLM 2 Technical Report
Phi-1.5 2023-09-11 Textbooks Are All You Need II: phi-1.5 technical report
Phi-2 2023-12-12 Phi-2: The surprising power of small language models
The Pile 2020-12 The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Pythia 2023-04-03 Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
RAG 2020-05-22 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
RedPajama-Data-1T 2023-04-17 RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens
RedPajama-INCITE-Base 2023-05-05 Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RedPajama-INCITE-Chat 2023-05-05 Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RedPajama-INCITE-Instruct 2023-05-05 Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
RLHF 2022-04-04 Training language models to follow instructions with human feedback
ROOTS 2023-03-07 The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
RWKV 2023-05-22 RWKV: Reinventing RNNs for the Transformer Era
ScienceQA 2022-09-20 Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Stable Diffusion XL 0.9 2023-06-22 SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
SDXL 1.0 2023-07-26 Announcing SDXL 1.0
Self-Instruct 2022-12-20 Self-Instruct: Aligning Language Model with Self Generated Instructions
Sparks of AGI 2023-03-22 Sparks of Artificial General Intelligence: Early experiments with GPT-4
Stable Diffusion 2022-08-22
StableLM 2023-04-19 Stability AI Launches the First of its StableLM Suite of Language Models
StableVicuna 2023-04-28 Stability AI releases StableVicuna, the AI World's First Open Source RLHF LLM Chatbot
StarCoder 2023-05-04 StarCoder: A State-of-the-Art LLM for Code
StarCoderBase 2023-05-04 StarCoder: A State-of-the-Art LLM for Code
StarCoderData 2023-05-04 StarCoder: A State-of-the-Art LLM for Code
T5 2019-10-23 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TANGO 2023-04-24 Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
The Stack 2022-11-20 The Stack: 3 TB of permissively licensed source code
UL2 2022-05-10 UL2: Unifying Language Learning Paradigms
Vicuna 2023-03-30 Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
VideoLDM 2023-04-18 Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Whisper 2022-09-21 Robust Speech Recognition via Large-Scale Weak Supervision
Whisper v3 2023-11-06 New models and developer products announced at DevDay
XGen-7B 2023-06-28 Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length
Yi 2023-11-23
Zephyr 2023-10-25 Zephyr: Direct Distillation of LM Alignment

Related work