Player FM ऐप के साथ ऑफ़लाइन जाएं!
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Manage episode 439972240 series 3524393
We present a benchmark for assessing language models' role-playing abilities through dynamic conversations, utilizing player, interrogator, and judge models, validated by experiments comparing automated and human evaluations.
https://arxiv.org/abs//2409.06820
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1653 एपिसोडस
Manage episode 439972240 series 3524393
We present a benchmark for assessing language models' role-playing abilities through dynamic conversations, utilizing player, interrogator, and judge models, validated by experiments comparing automated and human evaluations.
https://arxiv.org/abs//2409.06820
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1653 एपिसोडस
すべてのエピソード
×प्लेयर एफएम में आपका स्वागत है!
प्लेयर एफएम वेब को स्कैन कर रहा है उच्च गुणवत्ता वाले पॉडकास्ट आप के आनंद लेंने के लिए अभी। यह सबसे अच्छा पॉडकास्ट एप्प है और यह Android, iPhone और वेब पर काम करता है। उपकरणों में सदस्यता को सिंक करने के लिए साइनअप करें।