AI / ML / LLM / Transformer Models Timeline Details

Viktor Garske @vemgar, Last update: Tue Dec 26 15:23:35 2023
← Back to the full graph

RLHF (Reinforcement Learning from Human Feedback)

This graph is clickable!

timeline 01/2022 01/2022 04/2022 04/2022 01/2022->04/2022 04/2023 04/2023 04/2022->04/2023 05/2023 05/2023 04/2023->05/2023 Instructgpt InstructGPT Stablevicuna StableVicuna Rlhf RLHF Rlhf->Instructgpt Rlhf->Stablevicuna Dpo DPO Rlhf->Dpo
Type
Training, Method
Paper name
Training language models to follow instructions with human feedback
Paper authors
Ouyang et al.
Paper link
https://arxiv.org/abs/2203.02155
Publish date
2022-04-04
Affiliation
OpenAI