attention_mask 🫥 encoder- and decoder-only standard attention_mask By Pengpeng Wu Posted on May 9, 2025 🚅 看了一下huggingface官方是怎么写标准的encoder-only和decoder-only的attention_mask的,写的还是非常有意思的~ [Read More] Tags: huggingface
tie_weights 😪 开工开工! By Pengpeng Wu Posted on May 6, 2025 🙂 疯狂工作的一天!看了一下transformers是如何实现权重绑定的! [Read More] Tags: huggingface
self.loss_function 😢 The holiday is over By Pengpeng Wu Posted on May 6, 2025 🙂 最近看Qwen3模型的时候,发现transformers在模型的forward中直接调用了self.loss_function,但在init初始化时并没有显式添加这一属性,猜测huggingface团队可能对loss进行了统一和封装,于是就详细探索了一下~ [Read More] Tags: huggingface
PPO 🤖 RLHF By Pengpeng Wu Posted on May 1, 2025 🎈 接上篇,大语言模型在经历SFT和训练Reward Model后,后面一个阶段就是开始PPO强化学习,这样就完成了整个RLHF过程。本节将从工程化视角解读trl是如何实现PPO强化学习的~ [Read More] Tags: huggingface RL
Train Reward Model ✈️ 保持热爱,奔赴山海 ~ By Pengpeng Wu Posted on April 27, 2025 🎈 本节将解读如何使用trl训练一个Reward Model [Read More] Tags: huggingface RL