AI / ML / LLM / Transformer Models Timeline Details

Viktor Garske @vemgar, Last update: Tue Dec 26 15:23:35 2023
← Back to the full graph

DPO (Direct Preference Optimization)

This graph is clickable!

timeline 04/2022 04/2022 05/2023 05/2023 04/2022->05/2023 Rlhf RLHF Dpo DPO Rlhf->Dpo
Type
Method
Paper name
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper authors
Rafailov et al.
Paper link
https://arxiv.org/abs/2305.18290
Publish date
2023-05-29
Affiliation
Stanford