I am an Assistant Professor of Applied AI and Kathryn and Grant Swick Faculty Scholar at UChicago Booth, where I work on behavior-bound machine learning.
Machine learning is not a sterile industrial process; much in the way that it is hardware-bound and software-bound, it is also bound by the behavior of real-world actors such as workers, firms, and states.
By borrowing from fields like economics, my work tries to formalize this behavior and create algorithms, tools, and systems that are compatible with actual actors, not just idealized ones.
My work has received an ICML 2022 Outstanding Paper award, a Facebook Fellowship, and an NSERC PGS-D.
Prior to UChicago, I graduated with a PhD in Computer Science from Stanford University, where I was advised by Dan Jurafsky.
I have also spent time at Princeton Language & Intelligence (post-doc) and the University of Toronto (MSc, BSc).
Notable work:
-
A new framework for understanding dataset difficulty, based on the notion of V-usable information.
This led to the development of Stanford Human Preferences (SHP), the first large-scale dataset of human preferences over natural text, based on data from Reddit.
SHP was the only academic dataset used to post-train Llama-2, one of the most downloaded LLMs ever; the other datasets were from the likes of OpenAI and Meta itself.
Within a year of SHP's release, Reddit started licensing its data for AI training in deals valued at 200M+ USD.
-
Understanding Dataset Difficulty with V-Usable Information.
Kawin Ethayarajh, Yejin Choi, and Swabha Swayamdipta.
ICML 2022 (outstanding paper - top 10 of 1233 accepted).
paper
tweet 1
tweet 2
code
dataset 1
dataset 2
-
Post-training objectives (PPO, DPO) capture perceptual biases in how humans perceive random variables, like loss aversion.
Surprisingly, these biases make post-training more efficient and performant.
By more deeply integrating these biases, we can thus create post-training techniques that are even more efficient, performant, flexible to use:
KTO is the industry standard for aligning LLMs on offline binary feedback; humanline variants erase the gap between offline and online alignment.
-
Model Alignment as Prospect Theoretic Optimization.
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela.
ICML 2024 (spotlight - top 3.5% of accepted).
paper
tweet
code
press
-
Online Alignment as Perceptual Loss.
Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh.
ICLR 2026.
paper
tweet
code
-
Contextual representations from LLMs are anisotropic: they occupy a narrow cone in vector space, making cosine similarity unreliable as a measure of semantic relatedness.
This paper identified and named the phenomenon, which has since become a foundational concept in NLP—cited as core motivation for contrastive learning approaches like SimCSE, embedding post-processing methods like whitening, and a broad literature on improving text representations.
-
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings.
Kawin Ethayarajh.
EMNLP 2019 (oral).
paper
tweet
code