Shaunak Halbe

ML PhD student
Georgia Institute of Technology

Email / CV / LinkedIn / Github / Google Scholar

About

I am a fourth year Machine Learning PhD student at Georgia Tech advised by Prof. Zsolt Kira. My research interests lie at the intersection of computer vision and natural language processing.

I am currently working on pre-training and post-training for multimodal LLMs and long-form video understanding.

I've interned at Amazon Science during Summer 2025 working on multimodal video retrieval using LLMs. At Georgia Tech, I have previously worked on multimodal representation learning and continual learning. Even before that, I have worked on generalization/robustness at USC+Meta and on embodied AI at CMU.

Research

	Grounding Descriptions in Images informs Zero-shot Visual Recognition Shaunak Halbe^* et. al. Under Review preprint We propose a new pretraining strategy for CLIP to learn fine-grained visual representations that exhibit strong zero-shot transfer performance.
	Continual Adaptation of Vision Transformers for Federated Learning Shaunak Halbe^, James Smith, Junjiao Tian, Zsolt Kira Transactions on Machine Learning Research (TMLR) 2024* Short Version: FL@FM Workshop, NeurIPS 2023 (Oral) paper / talk We propose a novel prompt learning and aggregation scheme for distributed training of foundation models
	Robustness through Data Augmentation Loss Consistency Tianjian Huang^, Shaunak Halbe^, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami Transactions on Machine Learning Research (TMLR) 2022 paper We introduce a novel loss-level regularizer to improve robustness to spurious correlations in generative models
	A Closer Look at Rehearsal-Free Continual Learning James Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt Kira CVPR-W 2023 paper We introduce knowledge distillation and regularization baselines using Foundation Models for rehearsal-free continual learning
	Reason & Act : A Modular Approach to Explanation Driven Agents for Vision and Language Navigation Shaunak Halbe, Ingrid Navarro, Jean Oh CMU Robotics Institute Working Papers Journal paper / poster / talk We present a modular agent for navigation with improved cross-modal grounding and semantic reasoning.
	Exploring Weaknesses of VQA Models through Attribution Driven Insights Shaunak Halbe ACL-W 2020 Short Version: CVPR-W 2020 paper / talk We present a consistency analysis of VQA models through the lens of attribution to evaluate adversarial robustness.