I’m a fifth-year PhD student in the Stanford CS department, where I work on machine learning under real-world incentives.

Much in the way that machine learning is bound by hardware and software, it is also bound by the incentives of real-world actors, such as workers, firms, and states. Often, these incentives are unstated, immutable, and conflict with one another. Borrowing from fields like economics, I work on formalizing these incentives and creating tools, platforms, and algorithms that incorporate them, so that the progress we make on paper translates to the real world.

Highlights:

  • SHP, the first large-scale public dataset of human preferences over text (5M examples)
  • Archangel, the largest suite of human feedback-aligned LLMs
  • Dynaboard, an evaluation-as-a-service platform used to host Dynabench, BabyLM, and others
  • HALOs, a framework for creating prospect-theoretic losses for alignment

I have received an ICML 2022 Outstanding Paper award, a Facebook Fellowship, and an NSERC PGS-D during my PhD. Prior to Stanford, I was a National Scholar at the University of Toronto.

I am on the job market for 2024.

Recent Work (full list)

Disincentives in Data Collection Dataset creation is a principal-agent problem that results in datasets often being much simpler than the tasks they purport to reflect. I design frameworks that help principals (e.g., researchers) build high-quality datasets by discovering mistakes and unstated assumptions made by agents (e.g., crowdworkers). For example, I created Stanford Human Preferences (SHP), the first large-scale dataset of human preferences over text. SHP is one of the few datasets used by Amazon AWS (for reranking generations), Microsoft Deepspeed Chat (to train LLMs) and Llama-2 (one of the most widely-used LLMs).

Pluralistic Model Alignment I discovered that methods for aligning LLMs with human feedback (e.g., RLHF, DPO) work in part because they implicitly model human biases in decision-making. This means that there is not one single objective for alignment, but rather a family of human-aware losses (HALOs). I then designed an alignment objective based on Kahneman & Tversky's prospect theory called KTO, which can align LLMs using binary feedback alone. KTO is thus far easier to use in the real world, where preferences are scarce and expensive to collect.

Utility-Driven Evaluation Organizations care not only about the upside from using a model but also its costs (fairness, memory, etc.), which are often ignored in a research setting. Working with researchers at Meta, we developed Dynaboard, a holistic evaluation-as-a-service platform for hosting benchmarks. Dynaboard has been used to host many challenges, including DADC (Dynamic Adversarial Data Collection), DataPerf, BabyLM, and Flores. The concept of utility-driven evaluation has since gained wide acceptance and underlies many benchmarks, like Stanford's HELM.