Reading Note on DeepSeek V3.2

December 02, 2025 in research papers

DeepSeek launches its V3.2 model, together with an experimental V3.2 Speciale. Both models are built “reasoning-first” and for agentic workflows. The technical report provides much more information beyond the evaluation and is worth a closer read. I am going to cover a few interesting sections from my reading.

Reading Note on gpt-oss - a new milestone for open-weight models

September 07, 2025 in research papers

gpt-oss-120B marks a milestone for open-weight models. It delivers a frontier-level reasoning performance and is expected to receive mainstream adoption. Most likely, it will serve as the new baseline for benchmarking and the de facto production choice. I recommend it for workloads that demand top intelligence and high serving throughput.

Reading Note on SimpleQA - Build High Quality Benchmark Dataset with LLM + Human

November 05, 2024 in research papers

The OpenAI team built a new benchmark dataset called SimpleQA that evaluates large language models' (LLMs) ability to answer factual questions. A particularly intriguing aspect of this paper is, in this era of LLMs, how the team of researchers leverages LLMs in their own workflow to design, iterate, and analyze a new dataset.

Reading Note on Thought Preference Optimization (TPO)

October 27, 2024 in research papers

Thought Preference Optimization (TPO): Prompt the model to generate a thought process followed by the response. TPO demonstrates significant performance gains on non-reasoning categories, including translation, marketing, and health; reasoning categories like math and analysis also show improvements.