Micah Benson

I am a first-year PhD student at Boston University Computing & Data Sciences. My research uses methods from mechanistic interpretability to look inside large language models to better understand how they learn and represent social concepts, such as race and gender biases or political perspectives. I am currently working with Prof. Mark Crovella and Prof. Ngozi Okidegbe and am supported by a CDS Wexler Fellowship.

Interpretability matters to responsible natural language processing because it gives us principled ways to understand and control the behavior of complex models. I aim to use my investigation of social concepts in LLMs to develop new strategies for evaluating biases in models and build techniques for steering language generation to be safer and fairer.

I am also broadly interested in AI ethics. What will be the impacts of deploying AI systems in new domains across our society? How can we best structure democratic participation in AI governance? By providing us with deeper knowledge of AI systems, interpretability can give us a foothold on these questions alongside other interdisciplinary research.

Before BU, I graduated from Washington University in St. Louis with a double major in data science and English. I developed machine learning methods for predicting cervical cancer recurrence at the Zhang Translational Genomics Laboratory where I was advised by Prof. Jin Zhang.

In my free time, I coach high school cross country and track & field for Boston Public Schools and compete in local distance running races for Battle Road Track Club. At WashU, I was a member of the varsity cross country and track teams.

If you have any questions, thoughts, or ideas for collaborations, please don’t hesitate to reach out!