MITTR  |  Artificial intelligence

OpenAI has trained its LLM to confess to bad behavior

OpenAI已训练其大型语言模型承认不当行为

OpenAI has trained its LLM to confess to bad behavior
2025-12-03  1295  困难
字体大小

OpenAI sees confessions as one step toward that goal. The work is still experimental, but initial results are promising, Boaz Barak, a research scientist at OpenAI, told me in an exclusive preview this week: “It’s something we’re quite excited about.”

请登录后继续阅读完整文章

还没有账号?立即注册

成为会员后您将享受无限制的阅读体验,并可使用更多功能,了解更多


免责声明:本文来自网络公开资料,仅供学习交流,其观点和倾向不代表本站立场。