AI / ML / LLM / Transformer Models Timeline Details

Viktor Garske @vemgar, Last update: Tue Dec 26 15:23:35 2023

← Back to the full graph

DPO (Direct Preference Optimization)

This graph is clickable!

Type

Method

Paper name

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper authors

Rafailov et al.

Paper link

https://arxiv.org/abs/2305.18290

Publish date

2023-05-29

Affiliation

Stanford