Rs' Log
|
  • Archive
  • Search

Hi, this is Rs, I picked this name years ago, back when I was obsessed with Bob Dylan’s Like a Rolling Stone. Currently, I’m an LLM engineer at ByteDance. Before that, I did both my undergrad and master’s at USTC, where I spent some time working on NLG. I keep my learning notes here, and from time to time I also write about things I’ve been through, random thoughts, or just the occasional rant.

P-GenRM: Personalized Generative Reward Model

21 March 2026

Attention Residual

19 March 2026

Self-Distillation as Privileged-Context Distillation

18 March 2026

KL Regularization Analysis

5 January 2026

From OneRec to RL

30 December 2025

Multi-Teacher On-Policy Distillation

19 December 2025

Conversational Rewards

13 December 2025

Knowledge Distillation

1 November 2025

AI Coding & 网页设计

14 September 2025

大模型post-training方法——强化学习篇

19 March 2025

GRPO From Scratch

5 March 2025

DeepSeek-V3技术报告解读

29 January 2025

DeepSeek-R1技术报告解读

27 January 2025

RAG路线

8 January 2025

强化学习笔记

21 November 2024

Deepspeed多机多卡训练&代码细节

30 October 2024

大模型post-training方法

9 October 2024
© 2026 Rs' Log Powered by Hugo & PaperMod