Shaunak Halbe

ML PhD student
Georgia Institute of Technology


Email  /  CV  /  LinkedIn  /  Github  /  Google Scholar

profile photo
About  |  Research |  Awards |  Service  
About

I am a fourth year Machine Learning PhD student at Georgia Tech advised by Prof. Zsolt Kira. My research interests lie at the intersection of computer vision and natural language processing.

I am currently working on pre-training and post-training for multimodal LLMs and long-form video understanding.

I've interned at Amazon Science during Summer 2025 working on multimodal video retrieval using LLMs. At Georgia Tech, I have worked on multimodal representation learning and continual learning. Even before that, I have worked on generalization/robustness at USC+Meta and on embodied AI at CMU.

Research
VIRTUE: Versatile Video Retrieval Through Unified Embeddings

Shaunak Halbe, et. al. (hidden for anonymity)

Under Review

preprint(coming soon)

VIRTUE unifies video search, composed retrieval, and moment localization in a single MLLM-based framework.

Grounding Descriptions in Images informs Zero-shot Visual Recognition

Shaunak Halbe, Junjiao Tian, K J Joseph, James Smith, Katherine Stevo, Vineeth N Balasubramanian, Zsolt Kira

Winter Conference on Applications of Computer Vision (WACV) 2026

preprint

We propose a new VLM pretraining strategy to learn fine-grained representations that exhibit strong zero-shot transfer.

Continual Adaptation of Vision Transformers for Federated Learning

Shaunak Halbe, James Smith, Junjiao Tian, Zsolt Kira

Transactions on Machine Learning Research (TMLR) 2024
Short Version: FL@FM Workshop, NeurIPS 2023 (Oral)
paper / talk

We propose a novel prompt learning and aggregation scheme for distributed training of foundation models.

Robustness through Data Augmentation Loss Consistency

Tianjian Huang*, Shaunak Halbe*, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami

Transactions on Machine Learning Research (TMLR) 2022
paper

We introduce a novel loss-level regularizer to improve robustness in generative models.

A Closer Look at Rehearsal-Free Continual Learning

James Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt Kira

CVPR-W 2023
paper

We introduce distillation and regularization baselines for continually training foundation models.

Reason & Act : A Modular Approach to Explanation Driven Agents for Vision and Language Navigation

Shaunak Halbe, Ingrid Navarro, Jean Oh

CMU Robotics Institute Working Papers Journal
paper / poster / talk

We present a modular agent for navigation with improved cross-modal grounding and semantic reasoning.

Exploring Weaknesses of VQA Models through Attribution Driven Insights

Shaunak Halbe

ACL-W 2020
Short Version: CVPR-W 2020
paper / talk

We present a consistency analysis of VQA models through the lens of attribution to evaluate adversarial robustness.

Awards

Service & Teaching

  • Graduate Teaching Assistant: CS 7643 Deep Learning, Fall 2023

  • Reviewer: CVPR 2023, NeurIPS-W 2023, CMU RI Working Papers Journal 2021

  • Volunteer: NeurIPS 2023, CoRL 2023, NAACL 2021, ACL 2020


Website template cloned from here!