AI / ML / LLM / Transformer Models Timeline Details
Viktor Garske
@vemgar
, Last update: Tue Dec 26 15:23:35 2023
← Back to the full graph
RLHF (Reinforcement Learning from Human Feedback)
This graph is clickable!
timeline
01/2022
01/2022
04/2022
04/2022
01/2022->04/2022
04/2023
04/2023
04/2022->04/2023
05/2023
05/2023
04/2023->05/2023
Instructgpt
InstructGPT
Stablevicuna
StableVicuna
Rlhf
RLHF
Rlhf->Instructgpt
Rlhf->Stablevicuna
Dpo
DPO
Rlhf->Dpo
Type
Training, Method
Paper name
Training language models to follow instructions with human feedback
Paper authors
Ouyang et al.
Paper link
https://arxiv.org/abs/2203.02155
Publish date
2022-04-04
Affiliation
OpenAI