is a group of researchers who study, analyze, and build various aspects of AI (including social) systems. Our work spans several areas - Applied Machine Learning, Responsible and Safe AI, Natural Language Processing, and Social Network Analysis. By understanding and measuring AI systems, we aim to develop solutions that contribute to the greater good of society.
As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we refer to as "AI Oversight". We study how model similarity affects both aspects of AI oversight by proposing a probabilistic metric for LM similarity based on overlap in model mistakes. Using this metric, we first show that LLM-as-a-judge scores favor models similar to the judge, generalizing recent self-preference results. Then, we study training on LM annotations, and find complementary knowledge between the weak supervisor and strong student model plays a crucial role in gains from "weak-to-strong generalization". As model capabilities increase, it becomes harder to find their mistakes, and we might defer more to AI oversight. However, we observe a concerning trend – model mistakes are becoming more similar with increasing capabilities, pointing to risks from correlated failures. Our work underscores the importance of reporting and correcting for model similarity, especially in the emerging paradigm of AI oversight.
@article{goel2025greatmodelsthinkalike,title={Great Models Think Alike and this Undermines AI Oversight},author={Goel, Shashwat and Struber, Joschka and Auzina, Ilze Amanda and Chandra, Karuna K and Kumaraguru, Ponnurangam and Kiela, Douwe and Prabhu, Ameya and Bethge, Matthias and Geiping, Jonas},year={2025},journal={Forty Second International Conference on Machine Learning},}
ICML
A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Graph Neural Networks (GNNs) are increasingly being used for a variety of ML applications on graph data. Because graph data does not follow the independently and identically distributed (i.i.d.) assumption, adversarial manipulations or incorrect data can propagate to other data points through message passing, which deteriorates the model’s performance. To allow model developers to remove the adverse effects of manipulated entities from a trained GNN, we study the recently formulated problem of Corrective Unlearning. We find that current graph unlearning methods fail to unlearn the effect of manipulations even when the whole manipulated set is known. We introduce a new graph unlearning method, Cognac, which can unlearn the effect of the manipulation set even when only 5% of it is identified.
@article{kolipaka2024cognacshotforgetbad,title={A Cognac shot to forget bad memories: Corrective Unlearning in GNNs},author={Kolipaka, Varshita and Sinha, Akshit and Mishra, Debangan and Kumar, Sumit and Arun, Arvindh and Goel, Shashwat and Kumaraguru, Ponnurangam},journal={Forty-Second International Conference on Machine Learning},year={2025},}
EASE
Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling
Ishan
Kavathekar, Raghav
Donakanti, Ponnurangam
Kumaraguru, and Karthik
Vaidhyanathan
International Conference on Evaluation and Assessment in Software Engineering (EASE), 2025
Function calling is a complex task with widespread applications in domains such as information retrieval, software engineering and automation. For example, a query to book the shortest flight from New York to London on January 15 requires identifying the correct parameters to generate accurate function calls. Large Language Models (LLMs) can automate this process but are computationally expensive and impractical in resource-constrained settings. In contrast, Small Language Models (SLMs) can operate efficiently, offering faster response times, and lower computational demands, making them potential candidates for function calling on edge devices. In this exploratory empirical study, we evaluate the efficacy of SLMs in generating function calls across diverse domains using zero-shot, few-shot, and fine-tuning approaches, both with and without prompt injection, while also providing the finetuned models to facilitate future applications. Furthermore, we analyze the model responses across a range of metrics, capturing various aspects of function call generation. Additionally, we perform experiments on an edge device to evaluate their performance in terms of latency and memory usage, providing useful insights into their practical applicability. Our findings show that while SLMs improve from zero-shot to few-shot and perform best with fine-tuning, they struggle significantly with adhering to the given output format. Prompt injection experiments further indicate that the models are generally robust and exhibit only a slight decline in performance. While SLMs demonstrate potential for the function call generation task, our results also highlight areas that need further refinement for real-time functioning.
@article{kavathekar2025smallmodelsbigtasks,title={Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling},author={Kavathekar, Ishan and Donakanti, Raghav and Kumaraguru, Ponnurangam and Vaidhyanathan, Karthik},year={2025},journal={International Conference on Evaluation and Assessment in Software Engineering (EASE)},eprint={2504.19277},archiveprefix={arXiv},primaryclass={cs.AI},url={https://arxiv.org/abs/2504.19277}}
NeurIPS
Random Representations Outperform Online Continually Learned Representations
Ameya
Prabhu, Shiven
Sinha, Ponnurangam
Kumaraguru, Philip
Torr, Ozan
Sener, and Puneet K.
Dokania
In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
Continual learning has primarily focused on the issue of catastrophic forgetting and the associated stability-plasticity tradeoffs. However, little attention has been paid to the efficacy of continually learned representations, as representations are learned alongside classifiers throughout the learning process. Our primary contribution is empirically demonstrating that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms. Our approach projects raw pixels using a fixed random transform, approximating an RBF-Kernel initialized before any data is seen. We then train a simple linear classifier on top without storing any exemplars, processing one sample at a time in an online continual learning setting. This method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all standard online continual learning benchmarks. Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios. Extending our investigation to popular exemplar-free scenarios with pretrained models, we find that training only a linear classifier on top of pretrained representations surpasses most continual fine-tuning and prompt-tuning strategies. Overall, our investigation challenges the prevailing assumptions about effective representation learning in online continual learning. Our code is available at://github.com/drimpossible/RanDumb.
@inproceedings{prabhu2024random,title={Random Representations Outperform Online Continually Learned Representations},author={Prabhu, Ameya and Sinha, Shiven and Kumaraguru, Ponnurangam and Torr, Philip and Sener, Ozan and Dokania, Puneet K.},year={2024},booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},}
Thank you for your interest in joining our team! We are always looking for talented and motivated individuals to join our team. If you are interested in working with us, please
apply here.