Student NLP Researcher
UC Berkeley
Oct 2025 - Present
Investigated gender bias in LLMs using NLP analysis on gender-blind résumés, employing prompt-sensitivity testing and vertical comparison between DeepSeek-R1 and DeepSeek-V3 models.
Designed a methodological framework to isolate the effect of alignment post-training, revealing that gendered framing persists as latent encoding in representation space despite surface-level behavioral constraints.
Demonstrated that bias is reinforced through linguistic form, producing novel insights into the limitations of alignment training and contributing to broader discourse on stereotype consolidation in language models.
Read the paper here.
NLP Analysis
AI Ethics
Post-Train Algorithms
LLM Probing & Tracing