I am an Assistant Professor of Applied AI and Kathryn and Grant Swick Faculty Scholar at UChicago Booth, where I work on behavior-bound machine learning.

Machine learning is not a sterile industrial process; much in the way that it is hardware-bound and software-bound, it is also bound by the behavior of real-world actors such as workers, firms, and states. I create machines that are compatible with actual actors, not just idealized ones.

My work has received an ICML 2022 Outstanding Paper award, a Facebook Fellowship, and an NSERC PGS-D. Prior to UChicago, I graduated with a PhD in Computer Science from Stanford University, where I was advised by Dan Jurafsky. I have also spent time at Princeton Language & Intelligence (post-doc) and the University of Toronto (MSc, BSc).

Notable work:

  • We propose framing dataset difficulty as the lack of V-usable information. This led to the development of Stanford Human Preferences (SHP), the first large-scale dataset of human preferences over natural text, based on data from Reddit. SHP was the only academic dataset used to post-train Llama-2, one of the most downloaded LLMs ever; the other datasets were from the likes of OpenAI and Meta itself. Within a year of SHP's release, Reddit started licensing its data for AI training in deals valued at 200M+ USD.
    • Understanding Dataset Difficulty with V-Usable Information.
      Kawin Ethayarajh, Yejin Choi, and Swabha Swayamdipta.
      ICML 2022 (outstanding paper - top 10 of 1233 accepted).
      paper tweet 1 tweet 2 code dataset 1 dataset 2
  • We prove that the top post-training objectives (PPO, DPO) capture perceptual biases in how humans see random variables. Surprisingly, these biases—like loss aversion—make post-training more efficient and performant than it otherwise would be. By more intentionally integrating these biases, we create new post-training techniques: KTO is the industry standard for aligning LLMs on offline binary feedback; humanline variants erase the gap between offline and online alignment.
    • KTO: Model Alignment as Prospect Theoretic Optimization.
      Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela.
      ICML 2024 (spotlight - top 3.5% of accepted).
      paper tweet code press
    • Humanline: Online Alignment as Perceptual Loss.
      Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh.
      ICLR 2026.
      paper tweet code
  • Contextual representations from LLMs are anisotropic: they occupy a narrow cone in vector space, making cosine similarity unreliable as a measure of semantic relatedness. This paper identified and named the phenomenon, which has since become a foundational concept in NLP—cited as core motivation for contrastive learning approaches like SimCSE, embedding post-processing methods like whitening, and a broad literature on improving text representations.
    • How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings.
      Kawin Ethayarajh.
      EMNLP 2019 (oral).
      paper tweet code