Kawin Ethayarajh

I am an Assistant Professor of Applied AI and Kathryn and Grant Swick Faculty Scholar at UChicago Booth, where I work on behavior-bound machine learning.

Machine learning is not a sterile industrial process; much in the way that it is hardware-bound and software-bound, it is also bound by the behavior of real-world actors such as workers, firms, and states. By borrowing from fields like economics, my work tries to formalize this behavior and create algorithms, tools, and systems that are compatible with actual actors, not just idealized ones.

My work has received an ICML 2022 Outstanding Paper award, a Facebook Fellowship, and an NSERC PGS-D. Prior to UChicago, I graduated with a PhD in Computer Science from Stanford University, where I was advised by Dan Jurafsky. I have also spent time at Princeton Language & Intelligence (post-doc) and the University of Toronto (MSc, BSc).

Notable work:

A new framework for understanding dataset difficulty, based on the notion of V-usable information. This led to the development of Stanford Human Preferences (SHP), the first large-scale open-source dataset of human preferences over text, based on data from Reddit and StackExchange. SHP was the only dataset not from Meta/OpenAI/Anthropic used to post-train Llama-2, one of the most downloaded LLMs ever. Within a year of SHP's release, Reddit started licensing its data for AI training in deals valued at 200M+ USD.
- Understanding Dataset Difficulty with V-Usable Information.
  Kawin Ethayarajh, Yejin Choi, and Swabha Swayamdipta.
  ICML 2022 (outstanding paper - top 10 of 1233 accepted).
  paper tweet 1 tweet 2 code dataset 1 dataset 2
Post-training objectives (PPO, DPO) unintentionally capture biases like loss aversion in how humans perceive random variables. Surprisingly, these perceptual biases help make the aligned model more performant, even when the goal is not necessarily to increase human utility, as in mathematical reasoning. By more deeply using these perceptual biases, we can create post-training techniques that are more efficient and flexible: KTO is the industry standard for aligning LLMs on binary feedback, especially in class-imbalanced settings typical of the real world; humanline variants erase the gap between offline and online alignment.
- Model Alignment as Prospect Theoretic Optimization.
  Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela.
  ICML 2024 (spotlight - top 3.5% of accepted).
  paper tweet code press
- Online Alignment as Perceptual Loss.
  Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh.
  preprint.
  paper tweet code