Linking pages
Linked pages
- [2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model https://arxiv.org/abs/2305.18290 8 comments
- GitHub - BlackSamorez/tensor_parallel: Automatically split your PyTorch models on multiple GPUs for training & inference https://github.com/BlackSamorez/tensor_parallel 0 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - eric-mitchell/direct-preference-optimization: Reference implementation for DPO (Direct Preference Optimization)
See how to search.