I am an Assistant Professor of Applied AI and Kathryn and Grant Swick Faculty Scholar at UChicago Booth, where I work on behavior-bound machine learning.
Machine learning is not a sterile industrial process; much in the way that it is hardware-bound and software-bound, it is also bound by the behavior of real-world actors such as workers, firms, and states.
I create machines that are compatible with actual actors, not just idealized ones.
My work has received an ICML 2022 Outstanding Paper award, a Facebook Fellowship, and an NSERC PGS-D.
Prior to UChicago, I graduated with a PhD in Computer Science from Stanford University, where I was advised by Dan Jurafsky.
I have also spent time at Princeton Language & Intelligence (post-doc) and the University of Toronto (MSc, BSc).
Notable work:
-
We propose framing dataset difficulty as the lack of V-usable information.
This led to the development of Stanford Human Preferences (SHP), the first large-scale dataset of human preferences over natural text, based on data from Reddit.
SHP was the only academic dataset used to post-train Llama-2, one of the most downloaded LLMs ever; the other datasets were from the likes of OpenAI and Meta itself.
Within a year of SHP's release, Reddit started licensing its data for AI training in deals valued at 200M+ USD.
-
Understanding Dataset Difficulty with V-Usable Information.
Kawin Ethayarajh, Yejin Choi, and Swabha Swayamdipta.
ICML 2022 (outstanding paper - top 10 of 1233 accepted).
paper
tweet 1
tweet 2
code
dataset 1
dataset 2
-
We prove that the top post-training objectives (PPO, DPO) capture perceptual biases in how humans see random variables.
Surprisingly, these biases—like loss aversion—make post-training more efficient and performant than it otherwise would be.
By more intentionally integrating these biases, we create new post-training techniques:
KTO is the industry standard for aligning LLMs on offline binary feedback; humanline variants erase the gap between offline and online alignment.
-
KTO: Model Alignment as Prospect Theoretic Optimization.
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela.
ICML 2024 (spotlight - top 3.5% of accepted).
paper
tweet
code
press
-
Humanline: Online Alignment as Perceptual Loss.
Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh.
ICLR 2026.
paper
tweet
code
-
Contextual representations from LLMs are anisotropic: they occupy a narrow cone in vector space, making cosine similarity unreliable as a measure of semantic relatedness. This paper identified and named the phenomenon, which has since become a foundational concept in NLP—cited as core motivation for contrastive learning approaches like SimCSE, embedding post-processing methods like whitening, and a broad literature on improving text representations.
-
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings.
Kawin Ethayarajh.
EMNLP 2019 (oral).
paper
tweet
code