| 
            
            | About 
                I am a fourth year Machine Learning PhD student at Georgia Tech advised by Prof. Zsolt Kira. My research interests lie at the intersection of computer vision and natural language processing.
                
 I am currently working on pre-training and post-training for multimodal LLMs and long-form video understanding.
 
 I've interned at Amazon Science during Summer 2025 working on multimodal video retrieval using LLMs. At Georgia Tech, I have previously worked on multimodal representation learning and continual learning. Even before that, I have worked on generalization/robustness at USC+Meta and on embodied AI at CMU.
 |  
         
          
            |  | Grounding Descriptions in Images informs Zero-shot Visual Recognition 
 Shaunak Halbe* et. al.
 
 Under Review
 preprint
 We propose a new pretraining strategy for CLIP to learn fine-grained visual representations that exhibit strong zero-shot transfer performance. |  
            |  | Continual Adaptation of Vision Transformers for Federated Learning 
 Shaunak Halbe*, James  Smith, Junjiao Tian, Zsolt Kira
 
 Transactions on Machine Learning Research (TMLR) 2024
 Short Version: FL@FM Workshop, NeurIPS 2023 (Oral)
 paper / 
              talk
 We propose a novel prompt learning and aggregation scheme for distributed training of foundation models |  
            |  | Robustness through Data Augmentation Loss Consistency 
 Tianjian Huang*, Shaunak Halbe*, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami
 
 Transactions on Machine Learning Research (TMLR) 2022
 paper
 We introduce a novel loss-level regularizer to improve robustness to spurious correlations in generative models |  
            |  | A Closer Look at Rehearsal-Free Continual Learning 
 James  Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt Kira
 
 CVPR-W 2023
 paper
 We introduce knowledge distillation and regularization baselines using Foundation Models for rehearsal-free continual learning |  
            |  | Reason & Act : A Modular Approach to Explanation Driven Agents for Vision and Language Navigation 
 Shaunak Halbe, Ingrid Navarro, Jean Oh
 
 CMU Robotics Institute Working Papers Journal
 paper /
              poster /
              talk
 We present a modular agent for navigation with improved cross-modal grounding and semantic reasoning. |  
            |  | Exploring Weaknesses of VQA Models through Attribution Driven Insights 
 Shaunak Halbe
 
 ACL-W 2020
 Short Version: CVPR-W 2020
 paper /
              talk
 We present a consistency analysis of VQA models through the lens of attribution to evaluate adversarial robustness. |  
  
    
       Service & Teaching
     
Graduate Teaching Assistant: CS 7643
  Deep Learning, Fall 2023
 Reviewer: CVPR 2023, NeurIPS-W 2023, CMU RI Working Papers Journal 2021
 Volunteer: NeurIPS 2023, CoRL 2023, NAACL 2021, ACL 2020  |  
          
            | 
 
 Website template cloned from here!
              
             |  |