Exploring the frontiers of knowledge at Precog

I am Shashwat Singh, I was a CLD at IIIT. I am about to join NYU CDS as a PhD student to research the science of foundational models. This is a very exciting time to be in the space of foundational models, no-one knows what they’re doing and any work to shed light on how to best model intelligence has a chance to be very useful and impactful for the foreseeable future. Precog is one of the best places in India to do foundational work in AI because PK has designed the lab culture to be one where wild ideas and motivated people flourish.

Joining

One of the first conversations I had with PK was with regards to what kind of work can one approach in Precog. I was sure that I wanted to work on core NLP topics — the move from traditional computational linguistics to Deep Learning was very interesting. The previous paradigm believed in rule-based specification of language and the former paradigm approached it as an incomprehensible modeling problem. I was not satisfied with this formulation, and wanted to crack open the black box of foundation models.

I got together with Shashwat Goel, we thought it would be funny to collaborate, and started looking into topics of relevance. I spent a good amount of time trying to figure out how these models worked, what the field cares about and what the leading theories about deep learning and representations are. I suppose at the end of it, the most important finding was that most people don’t “know” how to best do things — most empirical findings were overturned in following papers, and theoretical ones did not always apply to real settings (with few exceptions). Regardless, we found something interesting that we felt not enough people were talking about — Negation.

Negation

Language Models at the time couldn’t deal with negation. This was deeply unsatisfactory because it is such an essential part of language. Why would language models trained on so much data suck at such a fundamental thing?

With this we started a project to explain the failure of negation in modern language models, how to fix it.

The general game plan was, figure out what’s wrong with negation and then fix it. Pretty simple, except we took about 1.5 years for the first part — and OpenAI dealt with second part (without actually caring for it). During this time, there were other projects that started and published. While, PK would ask us about the end timeline, but wouldn’t pressure us to kill the project or pick something else. He understood that sometimes things take time despite best efforts.

This time was extremely instrumental in my development as a researcher of deep learning. We looked at every tiny aspect of language modeling and transformers based language models to come with a reasonable and verifiable hypothesis for why negation does not work in Language models. Discussions as part of this project ranged from the nature of deep learning to the philosophical characterizations of semantics. Finally, we came up with a characterization that seemed coherent that we published in an ACL workshop REPL4NLP.

At some point, I reached out to Dr. Danish Pruthi at IISc to work with him, he was entering the country and was interested in doing interpretability at the time. We decided it best for me to spend a semester at IISc. It was instituted as a collaboration between Precog and Danish.

Knowledge editing at IISc

In Bangalore, I was looking into ways to formalize and benchmark knowledge editing methods. There was some existing work on editing the knowledge of language models but it seems like there wasn’t a coherent way to talk about how far reaching these edits were. We set out to design an evaluation system that formally categorizes the levels of effects edits can have. We decided to ground it in the systems of knowledge graphs. Few months in, it turned out that there was a paper out with the exact idea and implementation. The slate had been cleaned.

The real credit for an advisor is based on how they react to failures, and PK was understanding and accepted that this is the cost of business.

During that time I started experimenting with erasure methods so as to figure out steering. The jump was that of a bigger vantage point. Knowledge Editing is about changing specific aspects of the model while steering and representation level control is about influencing the model output on the basis of larger concepts or statistical trends.

Steering and Collaborations

Those of us who study the mechanisms of pretrained models yearn for causal evidence. So much of previous endeavours were based on studying concepts that can be disentangled from hidden states — the natural next step would be whether we can exert control over LM outputs.

I started with looking into literature about concept erasure, and how that effects things — and a lot of the theoretical work in the domain, but very little of it was about steering. I started trying out variations and small edits on these algorithms in a rather unprincipled fashion — in so doing my intuition developed. Finally after a lot of experimentation, I found an algorithm that seemed to reliably steer things the way I wanted, but I didn’t know how to make sense of my findings and contextualize them.

Hence, I reached out to Shauli Ravfogel, who had done some pioneering work in the area of erasure and just a very nice guy. Together, we explored optimization proxies that when solved would explain and improve results. Our work resulted in a formulation for steering functions, we published this work in ICML 2024. I learnt a lot from him and his advisor, Ryan Cotterell, in this collaboration.

Trying out industry

Following summer (2024) I decided to intern for a quant finance company. I wanted to explore the space, and PK warned me that I wouldn’t like it — but he let me explore anyway. He turned out to be right, I came out of the internship with stronger convictions on wanting to join a PhD program.

Teaching

Teaching has been a big part of my life at precog and is also an essential part of the culture. More formally, I had the opportunity to help design the curriculum of the Responsible and Safe AI course and conduct tutorials.

Explorations

I wanted to look into image models. While my work has mostly been in language — I think of myself more as a researcher of intelligence. In Precog, I started with Sreeram Vennam, we were interested to study whether image models like the CLIP image encoder learn to process text inside an image. Our findings indicated that for whatever reasons, the contrastive task on the internet-scale data imbues in the image encoder a level of textual semantics. This was a cool finding but we didn’t have the time to take it further, we published this in the Unireps workshop at Neurips 2024. Perhaps, we will pick this up again.

I also worked with Dr. Makarand Tapaswi on extracting text-image features from generative diffusion models. Our experimentation revealed a few negative findings, one that struck me was that the image encoders used for Stable Diffusion are just very bad encoders. This collaboration broadened my horizons.

PhD Applications

Not much to say here, it was intense and demoralizing, but PK was involved through it all. He would assure me that I shouldn’t be worried and things would pan out and they did. I got some good offers and callbacks from University of Tubingen (and / or Max Plank), University of Copenhagen, and NYU CDS. Lots of considerations went into picking NYU, there’s the obvious con of it being a longer program — but the profs were more actively interested in the science of intelligence and foundational modeling, hence it made more sense and now I’m headed there.

Parting thoughts

Precog is among the best places to be in, there is a solid culture, a supporting advisor, and an amazing labspace. During my time here, other than my aforementioned collaborators, I had the pleasure of working alongside the most competent and inspirational labmates and friends: Varshita Kolipaka, Priyanshul Govil, Anirudh Govil, Abhinav Menon, Pratyaksh Gautam, Lakshmanan Lakshmanan Karuna Chandra, Akshit Sinha, Vamshi Krishna and many others.

Everyone before me has praised the lab-space, I will also. It is great, the floor space is amazing and there is a sunroof (see pics below). It is very important to get plenty of sun and plenty of walks.

Precog is a group of motivated students that work on foundational and applied problems.

I plan to be reachable over email, which will be available on my website.