Rs' Log

Hi, this is Rs, I picked this name years ago, back when I was obsessed with Bob Dylan’s Like a Rolling Stone. Currently, I’m an LLM engineer at ByteDance. Before that, I did both my undergrad and master’s at USTC, where I spent some time working on NLG. I keep my learning notes here, and from time to time I also write about things I’ve been through, random thoughts, or just the occasional rant. If something here resonates, feel free to drop me a line.

Agentic RL Notes

Tau-Bench

Auto Skills Survey

Composer 2: Training a Real-World Coding Agent

Meta-Harness: End-to-End Search Over Model Harnesses

Analysis of Codex & Claude Code

OpenAI & Anthropic Blogs (2026.01.01-2026.03.28)

Understanding Codex: From Context and Tools to Harness and Runtime

Self-Evolution of MiniMax-M2.7

Agent Harness

CharacterFlywheel

P-GenRM: Personalized Generative Reward Model

Attention Residual

Self-Distillation as Privileged-Context Distillation

KL Regularization Analysis

From OneRec to RL

Multi-Teacher On-Policy Distillation

Conversational Rewards

Knowledge Distillation

AI Coding & 网页设计