I'm a final-year PhD student at Stanford NLP, advised by Dan Jurafsky and supported by a Facebook Fellowship.

Evaluation in NLP is often reduced to a convenient one-dimensional metric, like test accuracy on a fixed dataset. My research takes a more expansive view, contending that NLP—and AI more broadly—should be evaluated along multiple dimensions and at every point in the pipeline. Some directions I’ve explored are:

  • Evaluating Data: Does the training data represent the task it purports to reflect?
  • Evaluating Metrics: Are we using the right tool for the job?
  • Evaluating Utility: How can we measure how useful our systems are to real people?

I've received an Oustanding Paper Award at ICML 2022 and a Best Paper Award at Repl4NLP @ ACL 2018. Prior to Stanford, I received an M.Sc. and B.Sc. at the University of Toronto, where I was advised by Graeme Hirst and worked with David Duvenaud and Frank Rudzicz.

Research Highlights