/images/avatar.png

[Stanford CS336] Assignment 5: Alignment and Reasoning Reinforcement Learning

1 Assignment Overview

In this assignment, you will gain hands-on experience in training language models to reason when solving math problems.

What to Implement

  1. Implement a zero-shot prompting baseline for the MATH competition dataset proposed by Hendrycks et al. [2021].
  2. Supervised Fine-Tuning (SFT) using reasoning traces from a stronger reasoning model (DeepSeek R1, DeepSeekAI et al., 2025).
  3. Use Expert Iteration to improve reasoning performance through verification rewards.
  4. Use Group Relative Policy Optimization (GRPO) to improve reasoning performance through verification rewards.

For interested students, we will release an optional part of the assignment in the coming days: aligning language models with human preferences.

[Stanford CS336] Assignment 1: Building a Transformer Language Model

Why Should Systems Enthusiasts Learn Large Language Models?

In today’s AI technology wave, mastering large model knowledge has become an essential skill for systems developers. By participating in Stanford CS336 Large Model Systems Course, I began my journey of building large models from scratch. This course is likely to become a landmark course in the systems field over the next 3 years (similar to the position of CMU 15-445 database course in recent years).

High-Performance BPE Tokenizer Optimization: From 10 Minutes to 1 Second

This article is supplementary reading for CS336 Assignment 1, providing a detailed introduction to the optimized implementation of the BPE tokenizer.

Background

The recommended cppyy in the documentation has issues in Mac and Linux environments. To pursue high performance, I used Pybind11 to bind C++ code: pre-tokenization is handled by Python, while the BPE merge process is delegated to C++. The actual biggest bottleneck is still pre-tokenization, which can be parallelized using the existing code pretokenization_example.py for chunked parallel processing (8 cores 100s → 16 cores 30s).

Apply for 1Password OpenSource Plan Through Open Source Projects to Get Teams Subscription License

Introduction

Today, most platforms have gradually transitioned from “one-time purchase” to “subscription-based”, and 1Password 8 is no exception. It’s understandable that companies make such moves to maintain operations and continue development.

As of March 2025, 1Password 8 Individual costs USD $35.88 per year, and Teams Starter (10 users) costs USD $239.4 per year.

/applying-1password-open-source-plan/pricing.png

Once you’re approved for 1Password for Open Source Project, you can get a permanent 1Password for Teams subscription for free, which is truly generous.

Build Doris on MacBook M1

Install Environment Dependencies

brew install automake autoconf libtool pkg-config texinfo coreutils gnu-getopt \
python@3 cmake ninja ccache bison byacc gettext wget pcre maven llvm@16 openjdk@17 npm

Doris master currently only supports JDK 17

Environment variables that need to be set:

export JAVA_HOME="/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home"
export PATH=$JAVA_HOME/bin:$PATH
export PATH="/opt/homebrew/opt/openjdk@17/bin:$PATH"
export PATH="/opt/homebrew/opt/texinfo/bin:$PATH"

Clone Your Code

  1. Clone the repository