The detection of controversial content in political discussions on the Internet is a critical challenge in maintaining healthy digital discourse. Unlike much of the existing literature that relies on synthetically balanced data, our work preserves the natural distribution of controversial and non-controversial posts. This real-world imbalance highlights a core challenge that needs to be addressed for practical deployment. Our study re-evaluates well-established methods for detecting controversial content. We curate our own dataset focusing on the Indian political context that preserves the natural distribution of controversial content, with only 12.9% of the posts in our dataset being controversial. This disparity reflects the true imbalance in real-world political discussions and highlights a critical limitation in the existing evaluation methods. Benchmarking on datasets that model data imbalance is vital for ensuring real-world applicability. Thus, in this work, (i) we release our dataset, with an emphasis on class imbalance, that focuses on the Indian political context, (ii) we evaluate existing methods from this domain on this dataset and demonstrate their limitations in the imbalanced setting, (iii) we introduce an intuitive metric to measure a model’s robustness to class imbalance, (iv) we also incorporate ideas from the domain of Topological Data Analysis, specifically Persistent Homology, to curate features that provide richer representations of the data. Furthermore, we benchmark models trained with topological features against established baselines.
@article{arun2025topo,title={Topo Goes Political: TDA-Based Controversy Detection in Imbalanced Reddit Political Data},author={Arun, Arvindh and Chandra, Karuna K and Sinha, Akshit and Velayutham, Balakumar and Arora, Jashn and Jain, Manish and Kumaraguru, Ponnurangam},year={2025},journal={5th International Workshop on Computational Methods for Online Discourse Analysis (BeyondFacts’25) Collocated with The Web Conference 2025},}
SSI-FM @ ICLR
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven
Sinha, Shashwat
Goel, Ponnurangam
Kumaraguru, Jonas
Geiping, Matthias
Bethge, and Ameya
Prabhu
Scaling Self-Improving Foundation Models Workshop at ICLR ’25, 2025
There is growing excitement about the potential of Language Models (LMs) to accelerate scientific discovery. Falsifying hypotheses is key to scientific progress, as it allows claims to be iteratively refined over time. This process requires significant researcher effort, reasoning, and ingenuity. Yet current benchmarks for LMs predominantly assess their ability to generate solutions rather than challenge them. We advocate for developing benchmarks that evaluate this inverse capability - creating counterexamples for subtly incorrect solutions. To demonstrate this approach, we start with the domain of algorithmic problem solving, where counterexamples can be evaluated automatically using code execution. Specifically, we introduce REFUTE, a dynamically updating benchmark that includes recent problems and incorrect submissions from programming competitions, where human experts successfully identified counterexamples. Our analysis finds that the best reasoning agents, even OpenAI o3-mini (high) with code execution feedback, can create counterexamples for only <9% of incorrect solutions in REFUTE, even though ratings indicate its ability to solve up to 48% of these problems from scratch. We hope our work spurs progress in evaluating and enhancing LMs’ ability to falsify incorrect solutions - a capability that is crucial for both accelerating research and making models self-improve through reliable reflective reasoning.
@article{sinha2025falsify,title={Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation},author={Sinha, Shiven and Goel, Shashwat and Kumaraguru, Ponnurangam and Geiping, Jonas and Bethge, Matthias and Prabhu, Ameya},year={2025},journal={Scaling Self-Improving Foundation Models Workshop at ICLR '25},}
IJDSA
Deep learning and transfer learning to understand emotions: a PoliEMO dataset and multi-label classification in Indian elections
Anuradha
Surolia, Shikha
Mehta, and Ponnurangam
Kumaraguru
International Journal of Data Science and Analytics, 2025
Understanding user emotions to identify user opinion, sentiment, stance, and preferences has become a hot topic of research in the last few years. Many studies and datasets are designed for user emotion analysis including news websites, blogs, and user tweets. However, there is little exploration of political emotions in the Indian context for multi-label emotion detection. This paper presents a PoliEMO dataset—a novel benchmark corpus of political tweets in a multi-label setup for Indian elections, consisting of over 3512 tweets manually annotated. In this work, 6792 labels were generated for six emotion categories: anger, insult, joy, neutral, sadness, and shameful. Next, PoliEMO dataset is used to understand emotions in a multi-label context using state-of-the-art machine learning algorithms with multi-label classifier (binary relevance (BR), label powerset (LP), classifier chain (CC), and multi-label k-nearest neighbors (MkNN)) and deep learning models like convolutional neural network (CNN), long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), and transfer learning model, i.e., bidirectional encoder representations from transformers (BERT). Experiments and results show Bi-LSTM performs better with micro-averaged F1 score of 0.81, macro-averaged F1 score of 0.78, and accuracy 0.68 as compared to state-of-the-art approaches.
@article{surolia2025deeplearning,title={Deep learning and transfer learning to understand emotions: a PoliEMO dataset and multi-label classification in Indian elections},author={Surolia, Anuradha and Mehta, Shikha and Kumaraguru, Ponnurangam},year={2025},journal={International Journal of Data Science and Analytics},pages={1--15},}
WebSci
COBIAS: Assessing the Contextual Reliability of Bias Benchmarks for Language Models
Priyanshul
Govil, Hemang
Jain, Vamshi
Bonagiri, Aman
Chadha, Ponnurangam
Kumaraguru, Manas
Gaur, and Sanorita
Dey
In Proceedings of the 17th ACM Web Science Conference 2025, 2025
Large Language Models (LLMs) often inherit biases from the web data they are trained on, which contains stereotypes and prejudices. Current methods for evaluating and mitigating these biases rely on bias-benchmark datasets. These benchmarks measure bias by observing an LLM’s behavior on biased statements. However, these statements lack contextual considerations of the situations they try to present. To address this, we introduce a contextual reliability framework, which evaluates model robustness to biased statements by considering the various contexts in which they may appear. We develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to measure a biased statement’s reliability in detecting bias based on the variance in model behavior across different contexts. To evaluate the metric, we augment 2,291 stereotyped statements from two existing benchmark datasets by adding contextual information. We show that COBIAS aligns with human judgment on the contextual reliability of biased statements and can be used to create reliable datasets, which would assist bias mitigation works.
@inproceedings{govil2025cobias,title={COBIAS: Assessing the Contextual Reliability of Bias Benchmarks for Language Models},author={Govil, Priyanshul and Jain, Hemang and Bonagiri, Vamshi and Chadha, Aman and Kumaraguru, Ponnurangam and Gaur, Manas and Dey, Sanorita},year={2025},booktitle={Proceedings of the 17th ACM Web Science Conference 2025},}
WebSci
Framing the Fray: Conflict Framing in Indian Election News Coverage
In covering elections, journalists often use conflict frames which depict events and issues as adversarial, often highlighting confrontations between opposing parties. Although conflict frames result in more citizen engagement, they may distract from substantive policy discussion. In this work, we analyze the use of conflict frames in online English-language news articles by seven major news outlets in the 2014 and 2019 Indian general elections. We find that the use of conflict frames is not linked to the news outlets’ ideological biases but is associated with TV-based (rather than print-based) media. Further, the majority of news outlets do not exhibit ideological biases in portraying parties as aggressors or targets in articles with conflict frames. Finally, comparing news articles reporting on political speeches to their original speech transcripts, we find that, on average, news outlets tend to consistently report on attacks on the opposition party in the speeches but under-report on more substantive electoral issues covered in the speeches such as farmers’ issues and infrastructure.
@inproceedings{chebroluframing,title={Framing the Fray: Conflict Framing in Indian Election News Coverage},author={Chebrolu, Tejasvi and Modepalle, Rohan and Vardhan, Harsha and Rajadesingan, Ashwin and Kumaraguru, Ponnurangam},year={2025},booktitle={Proceedings of the 17th ACM Conference on Web Science},}
WebSci
Personal Narratives Empower Politically Disinclined Individuals to Engage in Political Discussions
Tejasvi
Chebrolu, Ashwin
Rajadesingan, and Ponnurangam
Kumaraguru
In Proceedings of the 17th ACM Conference on Web Science, 2025
Engaging in political discussions is crucial in democratic societies, yet many individuals remain politically disinclined due to various factors such as perceived knowledge gaps, conflict avoidance, or a sense of disconnection from the political system. In this paper, we explore the potential of personal narratives—short, first-person accounts emphasizing personal experiences—as a means to empower these individuals to participate in online political discussions. Using a text classifier that identifies personal narratives, we conducted a large-scale computational analysis to evaluate the relationship between the use of personal narratives and participation in political discussions on Reddit. We find that politically disinclined individuals (PDIs) are more likely to use personal narratives than more politically active users. Personal narratives are more likely to attract and retain politically disinclined individuals in political discussions than other comments. Importantly, personal narratives posted by politically disinclined individuals are received more positively than their other comments in political communities. These results emphasize the value of personal narratives in promoting inclusive political discourse.
@inproceedings{chebrolunarrative,title={Personal Narratives Empower Politically Disinclined Individuals to Engage in Political Discussions},author={Chebrolu, Tejasvi and Rajadesingan, Ashwin and Kumaraguru, Ponnurangam},year={2025},booktitle={Proceedings of the 17th ACM Conference on Web Science},}
SIFM@ICLR
Great Models Think Alike and this Undermines AI Oversight
Shashwat
Goel, Joschka
Struber, Ilze Amanda
Auzina, Karuna K
Chandra, Ponnurangam
Kumaraguru, Douwe
Kiela, Ameya
Prabhu, Matthias
Bethge, and Jonas
Geiping
In ICLR Workshop on Self-Improving Foundation Models, 2025
As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we refer to as "AI Oversight". We study how model similarity affects both aspects of AI oversight by proposing a probabilistic metric for LM similarity based on overlap in model mistakes. Using this metric, we first show that LLM-as-a-judge scores favor models similar to the judge, generalizing recent self-preference results. Then, we study training on LM annotations, and find complementary knowledge between the weak supervisor and strong student model plays a crucial role in gains from "weak-to-strong generalization". As model capabilities increase, it becomes harder to find their mistakes, and we might defer more to AI oversight. However, we observe a concerning trend – model mistakes are becoming more similar with increasing capabilities, pointing to risks from correlated failures. Our work underscores the importance of reporting and correcting for model similarity, especially in the emerging paradigm of AI oversight.
@inproceedings{goel2025greatmodelsthinkalike,title={Great Models Think Alike and this Undermines AI Oversight},author={Goel, Shashwat and Struber, Joschka and Auzina, Ilze Amanda and Chandra, Karuna K and Kumaraguru, Ponnurangam and Kiela, Douwe and Prabhu, Ameya and Bethge, Matthias and Geiping, Jonas},year={2025},booktitle={ICLR Workshop on Self-Improving Foundation Models},}
COLING
KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting
Thilini
Wijesiriwardene, Ruwan
Wickramarachchi, Sreeram
Vennam, Vinija
Jain, Aman
Chadha, Amitava
Das, Ponnurangam
Kumaraguru, and Amit
Sheth
In The 31st International Conference on Computational Linguistics (COLING 2025), 2025
Making analogies is fundamental to cognition. Proportional analogies, which consist of four terms, are often used to assess linguistic and cognitive abilities. For instance, completing analogies like "Oxygen is to Gas as blank is to blank" requires identifying the semantic relationship (e.g., "type of") between the first pair of terms ("Oxygen" and "Gas") and finding a second pair that shares the same relationship (e.g., "Aluminum" and "Metal"). In this work, we introduce a 15K Multiple-Choice Question Answering (MCQA) dataset for proportional analogy completion and evaluate the performance of contemporary Large Language Models (LLMs) in various knowledge-enhanced prompt settings. Specifically, we augment prompts with three types of knowledge: exemplar, structured, and targeted. Our results show that despite extensive training data, solving proportional analogies remains challenging for current LLMs, with the best model achieving an accuracy of 55%. Notably, we find that providing targeted knowledge can better assist models in completing proportional analogies compared to providing exemplars or collections of structured knowledge.
@inproceedings{wijesiriwardene2024exploring,title={KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting},author={Wijesiriwardene, Thilini and Wickramarachchi, Ruwan and Vennam, Sreeram and Jain, Vinija and Chadha, Aman and Das, Amitava and Kumaraguru, Ponnurangam and Sheth, Amit},year={2025},booktitle={The 31st International Conference on Computational Linguistics (COLING 2025)},}
AAAI
Higher Order Structures For Graph Explanations
Akshit
Sinha*, Sreeram
Vennam*, Charu
Sharma, and Ponnurangam
Kumaraguru
In The 39th Annual AAAI Conference on Artificial Intelligence, 2025
Graph Neural Networks (GNNs) have emerged as powerful tools for learning representations of graph-structured data, demonstrating remarkable performance across various tasks. Recognising their importance, there has been extensive research focused on explaining GNN predictions, aiming to enhance their interpretability and trustworthiness. However, GNNs and their explainers face a notable challenge: graphs are primarily designed to model pair-wise relationships between nodes, which can make it tough to capture higher-order, multi-node interactions. This characteristic can pose difficulties for existing explainers in fully representing multi-node relationships. To address this gap, we present Framework For Higher-Order Representations In Graph Explanations (FORGE), a framework that enables graph explainers to capture such interactions by incorporating higher-order structures, resulting in more accurate and faithful explanations. Extensive evaluation shows that on average real-world datasets from the GraphXAI benchmark and synthetic datasets across various graph explainers, FORGE improves average explanation accuracy by 1.9x and 2.25x, respectively. We perform ablation studies to confirm the importance of higher-order relations in improving explanations, while our scalability analysis demonstrates FORGE’s efficacy on large graphs.
@inproceedings{sinha2024higherorderstructuresgraph,title={Higher Order Structures For Graph Explanations},author={Sinha, Akshit and Vennam, Sreeram and Sharma, Charu and Kumaraguru, Ponnurangam},year={2025},booktitle={The 39th Annual AAAI Conference on Artificial Intelligence},}
2024
arXiv
A Cognac shot to forget bad memories: Corrective Unlearning in GNNs
Graph Neural Networks (GNNs) are increasingly being used for a variety of ML applications on graph data. Because graph data does not follow the independently and identically distributed (i.i.d.) assumption, adversarial manipulations or incorrect data can propagate to other data points through message passing, which deteriorates the model’s performance. To allow model developers to remove the adverse effects of manipulated entities from a trained GNN, we study the recently formulated problem of Corrective Unlearning. We find that current graph unlearning methods fail to unlearn the effect of manipulations even when the whole manipulated set is known. We introduce a new graph unlearning method, Cognac, which can unlearn the effect of the manipulation set even when only 5% of it is identified.
@misc{kolipaka2024cognacshotforgetbad,title={A Cognac shot to forget bad memories: Corrective Unlearning in GNNs},author={Kolipaka, Varshita and Sinha, Akshit and Mishra, Debangan and Kumar, Sumit and Arun, Arvindh and Goel, Shashwat and Kumaraguru, Ponnurangam},year={2024},eprint={2412.00789},archiveprefix={arXiv},primaryclass={cs.LG},}
ACM
From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences
Current computational approaches for analysing or generating code-mixed sentences do not explicitly model "naturalness" or "acceptability" of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-controlled generation of code-mixed text. To this end, we construct Cline - a dataset containing human acceptability judgements for English-Hindi (en-hi) code-mixed text. Cline is the largest of its kind with 16,642 sentences, consisting of samples sourced from two sources: synthetically generated code-mixed text and samples collected from online social media. Our analysis establishes that popular code-mixing metrics such as CMI, Number of Switch Points, Burstines, which are used to filter/curate/compare code-mixed corpora have low correlation with human acceptability judgements, underlining the necessity of our dataset. Experiments using Cline demonstrate that simple Multilayer Perceptron (MLP) models trained solely on code-mixing metrics are outperformed by fine-tuned pre-trained Multilingual Large Language Models (MLLMs). Specifically, XLM-Roberta and Bernice outperform IndicBERT across different configurations in challenging data settings. Comparison with ChatGPT’s zero and fewshot capabilities shows that MLLMs fine-tuned on larger data outperform ChatGPT, providing scope for improvement in code-mixed tasks. Zero-shot transfer from English-Hindi to English-Telugu acceptability judgments using our model checkpoints proves superior to random baselines, enabling application to other code-mixed language pairs and providing further avenues of research. We publicly release our human-annotated dataset, trained checkpoints, code-mix corpus, and code for data generation and model training.
@inproceedings{kodali2024humanjudgementspredictivemodels,title={From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences},author={Kodali, Prashant and Goel, Anmol and Asapu, Likhith and Bonagiri, Vamshi Krishna and Govil, Anirudh and Choudhury, Monojit and Shrivastava, Manish and Kumaraguru, Ponnurangam},year={2024},booktitle={},}
UniReps @ NeurIPS
Emergence of Text Semantics in CLIP Image Encoders
Sreeram
Vennam, Shashwat
Singh, Anirudh
Govil, and Ponnurangam
Kumaraguru
In UniReps: 2nd Edition of the Workshop on Unifying Representations in Neural Models, 2024
Certain self-supervised approaches to train image encoders, like CLIP, align images with their text captions. However, these approaches do not have an a priori incentive to learn to associate text inside the image with the semantics of the text. Our work studies the semantics of text rendered in images. We show evidence suggesting that the image representations of CLIP have a subspace for textual semantics that abstracts away fonts. Furthermore, we show that the rendered text representations from the image encoder only slightly lag behind the text representations with respect to preserving semantic relationships.
@inproceedings{vennam2024emergence,title={Emergence of Text Semantics in {CLIP} Image Encoders},author={Vennam, Sreeram and Singh, Shashwat and Govil, Anirudh and Kumaraguru, Ponnurangam},year={2024},booktitle={UniReps: 2nd Edition of the Workshop on Unifying Representations in Neural Models},}
JURIX
InSaAF: Incorporating Safety Through Accuracy and Fairness - Are LLMs Ready for the Indian Legal Domain?
Recent advancements in language technology and Artificial Intelligence have resulted in numerous Language Models being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability of Large Language Models (LLMs) to perform legal tasks in the Indian landscape when social factors are involved. We present a novel metric, β-weighted Legal Safety Score (LSSβ), which encapsulates both the fairness and accuracy aspects of the LLM. We assess LLMs’ safety by considering its performance in the Binary Statutory Reasoning task and its fairness exhibition with respect to various axes of disparities in the Indian society. Task performance and fairness scores of LLaMA and LLaMA–2 models indicate that the proposed LSSβ metric can effectively determine the readiness of a model for safe usage in the legal sector. We also propose finetuning pipelines, utilising specialised legal datasets, as a potential method to mitigate bias and improve model safety. The finetuning procedures on LLaMA and LLaMA–2 models increase the LSSβ, improving their usability in the Indian legal domain. Our code is publicly released.
@inproceedings{Tripathi2024,title={InSaAF: Incorporating Safety Through Accuracy and Fairness - Are LLMs Ready for the Indian Legal Domain?},author={Tripathi, Yogesh and Donakanti, Raghav and Girhepuje, Sahil and Kavathekar, Ishan and Vedula, Bhaskara Hanuma and Krishnan, Gokul S. and Goel, Anmol and Goyal, Shreya and Ravindran, Balaraman and Kumaraguru, Ponnurangam},year={2024},booktitle={Legal Knowledge and Information Systems - JURIX 2024: The Thirty-seventh Annual Conference, Brno, Czech Republic, 11-13 December 2024},}
TMLR
Corrective Machine Unlearning
Shashwat
Goel, Ameya
Prabhu, Philip
Torr, Ponnurangam
Kumaraguru, and Amartya
Sanyal
In Transactions of Machine Learning Research (TMLR), 2024
Machine Learning models increasingly face data integrity challenges due to the use of large-scale training datasets drawn from the Internet. We study what model developers can do if they detect that some data was manipulated or incorrect. Such manipulated data can cause adverse effects including vulnerability to backdoored samples, systemic biases, and reduced accuracy on certain input domains. Realistically, all manipulated training samples cannot be identified, and only a small, representative subset of the affected data can be flagged. We formalize Corrective Machine Unlearning as the problem of mitigating the impact of data affected by unknown manipulations on a trained model, only having identified a subset of the corrupted data. We demonstrate that the problem of corrective unlearning has significantly different requirements from traditional privacy-oriented unlearning. We find most existing unlearning methods, including retraining-from-scratch without the deletion set, require most of the manipulated data to be identified for effective corrective unlearning. However, one approach, Selective Synaptic Dampening, achieves limited success, unlearning adverse effects with just a small portion of the manipulated samples in our setting, which shows encouraging signs for future progress. We hope our work spurs research towards developing better methods for corrective unlearning and offers practitioners a new strategy to handle data integrity challenges arising from web-scale training.
@inproceedings{goel2024corrective,title={Corrective Machine Unlearning},author={Goel, Shashwat and Prabhu, Ameya and Torr, Philip and Kumaraguru, Ponnurangam and Sanyal, Amartya},year={2024},booktitle={Transactions of Machine Learning Research (TMLR)},}
MLC @ NeurIPS
LLM Vocabulary Compression for Low-Compute Environments
Sreeram
Vennam, Anish R
Joishy, and Ponnurangam
Kumaraguru
In Workshop on Machine Learning and Compression, NeurIPS 2024, 2024
We present a method to compress the final linear layer of language models, reducing memory usage by up to 3.4x without significant performance loss. By grouping tokens based on Byte Pair Encoding (BPE) merges, we prevent materialization of the memory-intensive logits tensor. Evaluations on the TinyStories dataset show that our method performs on par with GPT-Neo and GPT2 while significantly improving throughput by up to 3x, making it suitable for low-compute environments.
@inproceedings{vennam2024llm,title={{LLM} Vocabulary Compression for Low-Compute Environments},author={Vennam, Sreeram and Joishy, Anish R and Kumaraguru, Ponnurangam},year={2024},booktitle={Workshop on Machine Learning and Compression, NeurIPS 2024},}
NeurIPS
Random Representations Outperform Online Continually Learned Representations
Ameya
Prabhu, Shiven
Sinha, Ponnurangam
Kumaraguru, Philip
Torr, Ozan
Sener, and Puneet K.
Dokania
In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
Continual learning has primarily focused on the issue of catastrophic forgetting and the associated stability-plasticity tradeoffs. However, little attention has been paid to the efficacy of continually learned representations, as representations are learned alongside classifiers throughout the learning process. Our primary contribution is empirically demonstrating that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms. Our approach projects raw pixels using a fixed random transform, approximating an RBF-Kernel initialized before any data is seen. We then train a simple linear classifier on top without storing any exemplars, processing one sample at a time in an online continual learning setting. This method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all standard online continual learning benchmarks. Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios. Extending our investigation to popular exemplar-free scenarios with pretrained models, we find that training only a linear classifier on top of pretrained representations surpasses most continual fine-tuning and prompt-tuning strategies. Overall, our investigation challenges the prevailing assumptions about effective representation learning in online continual learning. Our code is available at://github.com/drimpossible/RanDumb.
@inproceedings{prabhu2024random,title={Random Representations Outperform Online Continually Learned Representations},author={Prabhu, Ameya and Sinha, Shiven and Kumaraguru, Ponnurangam and Torr, Philip and Sener, Ozan and Dokania, Puneet K.},year={2024},booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},}
MathAI @ NeurIPS
Wu’s Method Boosts Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry
Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu’s method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu’s method is surprisingly strong. Wu’s method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu’s method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu’s method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu’s method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.
@inproceedings{sinha2024wus,title={Wu{\textquoteright}s Method Boosts Symbolic {AI} to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at {IMO} Geometry},author={Sinha, Shiven and Prabhu, Ameya and Kumaraguru, Ponnurangam and Bhat, Siddharth and Bethge, Matthias},year={2024},booktitle={The 4th Workshop on Mathematical Reasoning and AI at NeurIPS'24},}
ICML
Representation Surgery: Theory and Practice of Affine Steering
Shashwat
Singh, Shauli
Ravfogel, Jonathan
Herzig, Roee
Aharoni, Ryan
Cotterell, and Ponnurangam
Kumaraguru
In Forty-first International Conference on Machine Learning, 2024
Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text. In the case of neural language models, an encoding of the undesirable behavior is often present in the model’s representations. Thus, one natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model’s representations in a manner that reduces the probability of it generating undesirable text. This paper investigates the formal and empirical properties of steering functions, i.e., transformation of the neural language model’s representations that alter its behavior. First, we derive two optimal, in the least-squares sense, affine steering functions under different constraints. Our theory provides justification for existing approaches and offers a novel, improved steering approach. Second, we offer a series of experiments that demonstrate the empirical effectiveness of the methods in mitigating bias and reducing toxic generation.
@inproceedings{singhrepresentation,title={Representation Surgery: Theory and Practice of Affine Steering},author={Singh, Shashwat and Ravfogel, Shauli and Herzig, Jonathan and Aharoni, Roee and Cotterell, Ryan and Kumaraguru, Ponnurangam},year={2024},booktitle={Forty-first International Conference on Machine Learning},}
ICML
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Nathaniel
Li, Alexander
Pan, Anjali
Gopal, Summer
Yue, Daniel
Berrios, Alice
Gatti, Justin D.
Li, Ann-Kathrin
Dombrowski, Shashwat
Goel, Gabriel
Mukobi, Nathan
Helm-Burger, Rassin
Lababidi, Lennart
Justen, Andrew Bo
Liu, Michael
Chen, Isabelle
Barrass, Oliver
Zhang, Xiaoyuan
Zhu, Rishub
Tamirisa, Bhrugu
Bharathi, Ariel
Herbert-Voss, Cort B
Breuer, Andy
Zou, Mantas
Mazeika, Zifan
Wang, Palash
Oswal, Weiran
Lin, Adam Alfred
Hunt, Justin
Tienken-Harder, Kevin Y.
Shih, Kemper
Talley, John
Guan, Ian
Steneker, David
Campbell, Brad
Jokubaitis, Steven
Basart, Stephen
Fitz, Ponnurangam
Kumaraguru, Kallol Krishna
Karmakar, Uday
Tupakula, Vijay
Varadharajan, Yan
Shoshitaishvili, Jimmy
Ba, Kevin M.
Esvelt, Alexandr
Wang, and Dan
Hendrycks
In Forty-first International Conference on Machine Learning, 2024
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs.
@inproceedings{li2024the,title={The {WMDP} Benchmark: Measuring and Reducing Malicious Use with Unlearning},author={Li, Nathaniel and Pan, Alexander and Gopal, Anjali and Yue, Summer and Berrios, Daniel and Gatti, Alice and Li, Justin D. and Dombrowski, Ann-Kathrin and Goel, Shashwat and Mukobi, Gabriel and Helm-Burger, Nathan and Lababidi, Rassin and Justen, Lennart and Liu, Andrew Bo and Chen, Michael and Barrass, Isabelle and Zhang, Oliver and Zhu, Xiaoyuan and Tamirisa, Rishub and Bharathi, Bhrugu and Herbert-Voss, Ariel and Breuer, Cort B and Zou, Andy and Mazeika, Mantas and Wang, Zifan and Oswal, Palash and Lin, Weiran and Hunt, Adam Alfred and Tienken-Harder, Justin and Shih, Kevin Y. and Talley, Kemper and Guan, John and Steneker, Ian and Campbell, David and Jokubaitis, Brad and Basart, Steven and Fitz, Stephen and Kumaraguru, Ponnurangam and Karmakar, Kallol Krishna and Tupakula, Uday and Varadharajan, Vijay and Shoshitaishvili, Yan and Ba, Jimmy and Esvelt, Kevin M. and Wang, Alexandr and Hendrycks, Dan},year={2024},booktitle={Forty-first International Conference on Machine Learning},}
Graph neural networks (GNNs) are increasingly being used on sensitive graph-structured data, necessitating techniques for handling unlearning requests on the trained models, particularly node unlearning. However, unlearning nodes on GNNs is challenging due to the interdependence between the nodes in a graph. We compare MEGU, a state-of-the-art graph unlearning method, and SCRUB, a general unlearning method for classification, to investigate the efficacy of graph unlearning methods over traditional unlearning methods. Surprisingly, we find that SCRUB performs comparably or better than MEGU on random node removal and on removing an adversarial node injection attack. Our results suggest that 1) graph unlearning studies should incorporate general unlearning methods like SCRUB as baselines, and 2) there is a need for more rigorous behavioral evaluations that reveal the differential advantages of proposed graph unlearning methods. Our work, therefore, motivates future research into more comprehensive evaluations for assessing the true utility of graph unlearning algorithms.
@inproceedings{anonymous2024sanity,title={Sanity Checks for Evaluating Graph Unlearning},author={Kolipaka, Varshita and Sinha, Akshit and Mishra, Debangan and Kumar, Sumit and Arun, Arvindh and Goel, Shashwat and Kumaraguru, Ponnurangam},year={2024},booktitle={Third Conference on Lifelong Learning Agents - Workshop Track},}
KIL @ KDD
Towards Infusing Auxiliary Knowledge for Distracted Driver Detection
Distracted driving is a leading cause of road accidents globally. Identification of distracted driving involves reliably detecting and classifying various forms of driver distraction (e.g., texting, eating, or using in-car devices) from in-vehicle camera feeds to enhance road safety. This task is challenging due to the need for robust models that can generalize to a diverse set of driver behaviors without requiring extensive annotated datasets. In this paper, we propose KiD3, a novel method for distracted driver detection (DDD) by infusing auxiliary knowledge about semantic relations between entities in a scene and the structural configuration of the driver’s pose. Specifically, we construct a unified framework that integrates the scene graphs, and driver pose information with the visual cues in video frames to create a holistic representation of the driver’s this http URL results indicate that KiD3 achieves a 13.64% accuracy improvement over the vision-only baseline by incorporating such auxiliary knowledge with visual information.
@inproceedings{balappanawar2024towards,title={Towards Infusing Auxiliary Knowledge for Distracted Driver Detection},author={Balappanawar, Ishwar and Chamoli, Ashmit and Wickramarachchi, Ruwan and Mishra, Aditya and Kumaraguru, Ponnurangam},year={2024},booktitle={Fourth Workshop on Knowledge-infused Learning co-located with 30th ACM KDD Conference, Barcelona, Spain},}
LREC-COLING
SaGE: Evaluating Moral Consistency in Large Language Models
Vamshi Krishna
Bonagiri, Sreeram
Vennam, Priyanshul
Govil, Ponnurangam
Kumaraguru, and Manas
Gaur
In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024
Despite recent advancements showcasing the impressive capabilities of Large Language Models (LLMs) in conversational systems, we show that even state-of-the-art LLMs are morally inconsistent in their generations, questioning their reliability (and trustworthiness in general). Prior works in LLM evaluation focus on developing ground-truth data to measure accuracy on specific tasks. However, for moral scenarios that often lack universally agreed-upon answers, consistency in model responses becomes crucial for their reliability. To address this issue, we propose an information-theoretic measure called Semantic Graph Entropy (SaGE), grounded in the concept of “Rules of Thumb” (RoTs) to measure a model‘s moral consistency. RoTs are abstract principles learned by a model and can help explain their decision-making strategies effectively. To this extent, we construct the Moral Consistency Corpus (MCC), containing 50K moral questions, responses to them by LLMs, and the RoTs that these models followed. Furthermore, to illustrate the generalizability of SaGE, we use it to investigate LLM consistency on two popular datasets – TruthfulQA and HellaSwag. Our results reveal that task accuracy and consistency are independent problems, and there is a dire need to investigate these issues further.
@inproceedings{bonagiri-etal-2024-sage,title={{S}a{GE}: Evaluating Moral Consistency in Large Language Models},author={Bonagiri, Vamshi Krishna and Vennam, Sreeram and Govil, Priyanshul and Kumaraguru, Ponnurangam and Gaur, Manas},year={2024},booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},}
EMNLP Findings
Counter Turing Test (CT^2): Investigating AI-Generated Text Detection for Hindi - Ranking LLMs based on Hindi AI Detectability Index (ADI_hi)
Ishan
Kavathekar, Anku
Rani, Ashmit
Chamoli, Ponnurangam
Kumaraguru, Amit P.
Sheth, and Amitava
Das
In Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
The widespread adoption of Large Language Models (LLMs) and awareness around multilingual LLMs have raised concerns regarding the potential risks and repercussions linked to the misapplication of AI-generated text, necessitating increased vigilance. While these models are primarily trained for English, their extensive training on vast datasets covering almost the entire web, equips them with capabilities to perform well in numerous other languages. AI-Generated Text Detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by the emergence of techniques to bypass detection. In this paper, we report our investigation on AGTD for an indic language Hindi. Our major contributions are in four folds: i) examined 26 LLMs to evaluate their proficiency in generating Hindi text, ii) introducing the AI-generated news article in Hindi (AGhi) dataset, iii) evaluated the effectiveness of five recently proposed AGTD techniques: ConDA, J-Guard, RADAR, RAIDAR and Intrinsic Dimension Estimation for detecting AI-generated Hindi text, iv) proposed Hindi AI Detectability Index (ADIhi) which shows a spectrum to understand the evolving landscape of eloquence of AI-generated text in Hindi.
@inproceedings{kavathekar-etal-2024-counter,title={Counter {T}uring Test ($CT^2$): Investigating {AI}-Generated Text Detection for {H}indi - Ranking {LLM}s based on {H}indi {AI} Detectability Index ($ADI\_{hi}$)},author={Kavathekar, Ishan and Rani, Anku and Chamoli, Ashmit and Kumaraguru, Ponnurangam and Sheth, Amit P. and Das, Amitava},year={2024},booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},}
ICWSM
Put Your Money Where Your Mouth Is: Dataset and Analysis of Real World Habit Building Attempts
Hitkul
Jangra, Rajiv
Shah, and Ponnurangam
Kumaraguru
In Proceedings of the International AAAI Conference on Web and Social Media, 2024
The pursuit of habit building is challenging, and most people struggle with it. Research on successful habit formation is mainly based on small human trials focusing on the same habit for all the participants as conducting long-term heterogonous habit studies can be logistically expensive. With the advent of self-help, there has been an increase in online communities and applications that are centered around habit building and logging. Habit building applications can provide large-scale data on real-world habit building attempts and unveil the commonalities among successful ones. We collect public data on stickk.com, which allows users to track progress on habit building attempts called commitments. A commitment can have an external referee, regular check-ins about the progress, and a monetary stake in case of failure. Our data consists of 742,923 users and 397,456 commitments. In addition to the dataset, rooted in theories like Fresh Start Effect, Accountablity, and Loss Aversion, we ask questions about how commitment properties like start date, external accountability, monitory stake, and pursuing multiple habits together affects the odds of success. We found that people tend to start habits on temporal landmarks, but that does not affect the probability of their success. Practices like accountability and stakes are not often used but are strong determents of success. Commitments of 6 to 8 weeks in length, weekly reporting with an external referee, and a monetary amount at stake tend to be most successful. Finally, around 40% of all commitments are attempted simultaneously with other goals. Simultaneous attempts of pursuing commitments may fail early, but if pursued through the initial phase, they are statistically more successful than building one habit at a time.
@inproceedings{article,title={Put Your Money Where Your Mouth Is: Dataset and Analysis of Real World Habit Building Attempts},author={Jangra, Hitkul and Shah, Rajiv and Kumaraguru, Ponnurangam},year={2024},booktitle={Proceedings of the International AAAI Conference on Web and Social Media},}
LREC-COLING
Multilingual Coreference Resolution in Low-resource South Asian Languages
Ritwik
Mishra, Pooja
Desur, Rajiv Ratn
Shah, and Ponnurangam
Kumaraguru
In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024
Coreference resolution involves the task of identifying text spans within a discourse that pertain to the same real-world entity. While this task has been extensively explored in the English language, there has been a notable scarcity of publicly accessible resources and models for coreference resolution in South Asian languages. We introduce a Translated dataset for Multilingual Coreference Resolution (TransMuCoRes) in 31 South Asian languages using off-the-shelf tools for translation and word-alignment. Nearly all of the predicted translations successfully pass a sanity check, and 75% of English references align with their predicted translations. Using multilingual encoders, two off-the-shelf coreference resolution models were trained on a concatenation of TransMuCoRes and a Hindi coreference resolution dataset with manual annotations. The best performing model achieved a score of 64 and 68 for LEA F1 and CoNLL F1, respectively, on our test-split of Hindi golden set. This study is the first to evaluate an end-to-end coreference resolution model on a Hindi golden set. Furthermore, this work underscores the limitations of current coreference evaluation metrics when applied to datasets with split antecedents, advocating for the development of more suitable evaluation metrics.
@inproceedings{mishra-etal-2024-multilingual,title={Multilingual Coreference Resolution in Low-resource {S}outh {A}sian Languages},author={Mishra, Ritwik and Desur, Pooja and Shah, Rajiv Ratn and Kumaraguru, Ponnurangam},year={2024},booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},}
The default approach to deal with the enormous size and limited accessibility of many Web and social media networks is to sample one or more subnetworks from a conceptually unbounded unknown network. Clearly, the extracted subnetworks will crucially depend on the sampling scheme. Motivated by studies of homophily and opinion formation, we propose a variant of snowball sampling designed to prioritize inclusion of entire cohesive communities rather than any kind of representativeness, breadth, or depth of coverage. The method is illustrated on a concrete example, and experiments on synthetic networks suggest that it behaves as desired.
@inproceedings{articlf,title={Tight Sampling in Unbounded Networks},author={Jaglan, Kshitijaa and Pindiprolu, Meher and Sharma, Triansh and Singam, Abhijeeth and Goyal, Nidhi and Kumaraguru, Ponnurangam and Brandes, Ulrik},year={2024},booktitle={Proceedings of the International AAAI Conference on Web and Social Media},}
WOAH @ NAACL
X-posing Free Speech: Examining the Impact of Moderation Relaxation on Online Social Networks
Arvindh
Arun, Saurav
Chhatani, Jisun
An, and Ponnurangam
Kumaraguru
In Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), 2024
We investigate the impact of free speech and the relaxation of moderation on online social media platforms using Elon Musk’s takeover of Twitter as a case study. By curating a dataset of over 10 million tweets, our study employs a novel framework combining content and network analysis. Our findings reveal a significant increase in the distribution of certain forms of hate content, particularly targeting the LGBTQ+ community and liberals. Network analysis reveals the formation of cohesive hate communities facilitated by influential bridge users, with substantial growth in interactions hinting at increased hate production and diffusion. By tracking the temporal evolution of PageRank, we identify key influencers, primarily self-identified far-right supporters disseminating hate against liberals and woke culture. Ironically, embracing free speech principles appears to have enabled hate speech against the very concept of freedom of expression and free speech itself. Our findings underscore the delicate balance platforms must strike between open expression and robust moderation to curb the proliferation of hate online.
@inproceedings{arun-etal-2024-x,title={{X}-posing Free Speech: Examining the Impact of Moderation Relaxation on Online Social Networks},author={Arun, Arvindh and Chhatani, Saurav and An, Jisun and Kumaraguru, Ponnurangam},year={2024},booktitle={Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)},}
Thesis
Improving Content Quality for Online Professional Activities using Domain Specific Learning and Knowledge
@inproceedings{improvingcontentqualityforonlineprofessionalactivitiesusingdomainspecificlearningandknowledge,title={Improving Content Quality for Online Professional Activities using Domain Specific Learning and Knowledge},author={Goyal, N.},year={2024},booktitle={Ph.D. Thesis, IIIT-Delhi},}
Thesis
Sampling cohesive communities in unbounded networks
@inproceedings{modelingonlineuserinteractionsandtheirofflineeffectsonsociotechnicalplatforms,title={Modeling Online User Interactions and their Offline Effects on Socio-Technical Platforms},author={Hitkul},year={2024},booktitle={Ph.D. Thesis, IIIT-Delhi},}
Thesis
New Frontiers in Machine Unlearning
S.
Goel
In MS in Computer Science by Research, IIIT Hyderabad, 2024
@inproceedings{newfrontiersinmachineunlearning,title={New Frontiers in Machine Unlearning},author={Goel, S.},year={2024},booktitle={MS in Computer Science by Research, IIIT Hyderabad},}
Thesis
Towards Trustworthy Digital Ecosystem: From Fair Representation Learning to Fraud Detection
A.
Arun
In MS in Computer Science by Research at IIIT Hyderabad, 2024
@inproceedings{towardstrustworthydigitalecosystemfromfairrepresentationlearningtofrauddetection,title={Towards Trustworthy Digital Ecosystem: From Fair Representation Learning to Fraud Detection},author={Arun, A.},year={2024},booktitle={MS in Computer Science by Research at IIIT Hyderabad},}
Thesis
Understanding Online Protests: Unveiling Strategies, Collective Narratives, and Harmful Behaviors
Prior work has shown that pretrained language models often make incorrect predictions for negated inputs. The reason for this behaviour has remained unclear. It has been argued that since language models (LMs) don’t change their predictions about factual propositions under negation, they might not detect negation. We show encoder LMs do detect negation as their representations across layers reliably distinguish negated inputs from non-negated inputs, and when negation leads to contradictions. However, probing experiments show that these LMs indeed don’t use negation when evaluating whether a factual statement is true, even when fine-tuned with the objective of changing outputs on negated sentences (Hosseini et al., 2021). We hypothesize about why pretrained LMs are inconsistent under negation: when the statement could refer to multiple ground entities with conflicting properties, negation may not entail a change in output. This means negation minimal pairs in different training samples can have the same completion in pretraining corpora. We argue pretraining may not provide enough signal to learn the distribution of ground referents a token could have, confusing the LM on how to handle negation.
@inproceedings{singhprobing,title={Probing Negation in Language Models},author={Singh, Shashwat and Goel, Shashwat and Vaduguru, Saujas and Kumaraguru, Ponnurangam},year={2023},booktitle={Workshop on Representation Learning for NLP},}
ASONAM
Together Apart: Decoding Support Dynamics in Online COVID-19 Communities
The COVID-19 pandemic that broke out globally in December 2019 put us all in an unprecedented situation. Social media became a vital source of support and information during the pandemic, as physical interactions were limited by people staying at home. This paper investigates support dynamics and user commitment in an online COVID-19 community of Reddit. We define various support classes and observe them along with user behavior and temporal phases for a coherent in the community. We perform survival analysis using Cox Regression to identify factors influencing a user’s commitment to the community. People seeking more emotional and informational support while they are COVID-positive stay longer in the community. Surprisingly, people who give more support in their early phases are less likely to stay. Additionally, contrary to common belief, our findings show that receiving emotional and informational support has little effect on users’ longevity in the community. Our results lead to a better understanding of user dynamics related to community support and can directly impact moderators and platform owners in designing community guidelines and incentive structures.
@inproceedings{10.1145/3625007.3627297,title={Together Apart: Decoding Support Dynamics in Online COVID-19 Communities},author={Jangid, Hitkul and Pandey, Tanisha and Singhal, Sonali and Kandhari, Pranjal and Tomar, Aryamann and Kumaraguru, Ponnurangam},year={2023},booktitle={Proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining},}
BDA
Explaining Finetuned Transformers on Hate Speech Predictions Using Layerwise Relevance Propagation
Ritwik
Mishra, Ajeet
Yadav, Rajiv
Shah, and Ponnurangam
Kumaraguru
In Proceedings of the 11th International Conference on Big Data and Artificial Intelligence, 2023
Explainability of model predictions has become imperative for architectures that involve fine-tuning of a pretrained transformer encoder for a downstream task such as hate speech detection. In this work, we compare the explainability capabilities of three post-hoc methods on the HateXplain benchmark with different encoders. Our research is the first work to evaluate the effectiveness of Layerwise Relevance Propagation (LRP) as a post-hoc method for fine-tuned transformer architectures used in hate speech detection. The analysis revealed that LRP tends to perform less effectively than the other two methods across various explainability metrics. A random rationale generator was found to be providing a better interpretation than the LRP method. Upon further investigation, it was discovered that the LRP method assigns higher relevance scores to the initial tokens of the input text because fine-tuned encoders tend to concentrate the text information in the embeddings corresponding to early tokens of the text. Therefore, our findings demonstrate that LRP relevance values at the input of fine-tuning layers are not a good representative of the rationales behind the predicted score.
@inproceedings{inbook,title={Explaining Finetuned Transformers on Hate Speech Predictions Using Layerwise Relevance Propagation},author={Mishra, Ritwik and Yadav, Ajeet and Shah, Rajiv and Kumaraguru, Ponnurangam},year={2023},booktitle={Proceedings of the 11th International Conference on Big Data and Artificial Intelligence},}
ECAI
CAFIN: Centrality Aware Fairness Inducing IN-Processing for Unsupervised Representation Learning on Graphs
Unsupervised Representation Learning on graphs is gaining traction due to the increasing abundance of unlabelled network data and the compactness, richness, and usefulness of the representations generated. In this context, the need to consider fairness and bias constraints while generating the representations has been well-motivated and studied to some extent in prior works. One major limitation of most of the prior works in this setting is that they do not aim to address the bias generated due to connectivity patterns in the graphs, such as varied node centrality, which leads to a disproportionate performance across nodes. In our work, we aim to address this issue of mitigating bias due to inherent graph structure in an unsupervised setting. To this end, we propose CAFIN, a centrality-aware fairness-inducing framework that leverages the structural information of graphs to tune the representations generated by existing frameworks. We deploy it on GraphSAGE (a popular framework in this domain) and showcase its efficacy on two downstream tasks - Node Classification and Link Prediction. Empirically, CAFIN consistently reduces the performance disparity across popular datasets (varying from 18 to 80% reduction in performance disparity) from various domains while incurring only a minimal cost of fairness.
@inproceedings{cafin,title={{CAFIN}: {C}entrality {A}ware {F}airness Inducing {IN}-Processing for Unsupervised Representation Learning on Graphs},author={Arun, Arvindh and Aanegola, Aakash and Agrawal, Amul and Narayanam, Ramasuri and Kumaraguru, Ponnurangam},year={2023},booktitle={Proceedings of the 26th European Conference on Artificial Intelligence},}
ACL
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents
Mehrad
Moradshahi, Tianhao
Shen, Kalika
Bali, Monojit
Choudhury, Gael
Chalendar, Anmol
Goel, Sungkyun
Kim, Prashant
Kodali, Ponnurangam
Kumaraguru, Nasredine
Semmar, Sina
Semnani, Jiwon
Seo, Vivek
Seshadri, Manish
Shrivastava, Michael
Sun, Aditya
Yadavalli, Chaobin
You, Deyi
Xiong, and Monica
Lam
In Findings of the Association for Computational Linguistics: ACL 2023, 2023
Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language.X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source.
@inproceedings{moradshahi-etal-2023-x,title={{X}-{R}i{SAWOZ}: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents},author={Moradshahi, Mehrad and Shen, Tianhao and Bali, Kalika and Choudhury, Monojit and de Chalendar, Gael and Goel, Anmol and Kim, Sungkyun and Kodali, Prashant and Kumaraguru, Ponnurangam and Semmar, Nasredine and Semnani, Sina and Seo, Jiwon and Seshadri, Vivek and Shrivastava, Manish and Sun, Michael and Yadavalli, Aditya and You, Chaobin and Xiong, Deyi and Lam, Monica},year={2023},booktitle={Findings of the Association for Computational Linguistics: ACL 2023},}
WASSA @ ACL
PrecogIIITH@WASSA2023: Emotion Detection for Urdu-English Code-mixed Text
Bhaskara Hanuma
Vedula, Prashant
Kodali, Manish
Shrivastava, and Ponnurangam
Kumaraguru
In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, 2023
Code-mixing refers to the phenomenon of using two or more languages interchangeably within a speech or discourse context. This practice is particularly prevalent on social media platforms, and determining the embedded affects in a code-mixed sentence remains as a challenging problem. In this submission we describe our system for WASSA 2023 Shared Task on Emotion Detection in English-Urdu code-mixed text. In our system we implement a multiclass emotion detection model with label space of 11 emotions. Samples are code-mixed English-Urdu text, where Urdu is written in romanised form. Our submission is limited to one of the subtasks - Multi Class classification and we leverage transformer-based Multilingual Large Language Models (MLLMs), XLM-RoBERTa and Indic-BERT. We fine-tune MLLMs on the released data splits, with and without pre-processing steps (translation to english), for classifying texts into the appropriate emotion category. Our methods did not surpass the baseline, and our submission is ranked sixth overall.
@inproceedings{vedula-etal-2023-precogiiith,title={{P}recog{IIITH}@{WASSA}2023: Emotion Detection for {U}rdu-{E}nglish Code-mixed Text},author={Vedula, Bhaskara Hanuma and Kodali, Prashant and Shrivastava, Manish and Kumaraguru, Ponnurangam},year={2023},booktitle={Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, {\&} Social Media Analysis},}
Thesis
Identify, Inspect and Intervene Multimodal Fake News
@inproceedings{beyondthesurfaceacomputationalexplorationoflinguisticambiguity,title={Beyond the Surface: A Computational Exploration of Linguistic Ambiguity},author={Goel, A.},year={2023},booktitle={MS in Computer Science by Research at IIIT Hyderabad},}
Thesis
Modeling Online User Interactions and their Offline effects on Socio-Technical Platforms
@inproceedings{modelingonlineuserinteractionsandtheirofflineeffectsonsociotechnicalplatformt,title={Modeling Online User Interactions and their Offline effects on Socio-Technical Platforms},author={Hitkul},year={2023},booktitle={Ph.D. Comprehensive Report},}
2022
ACL
SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing
Code mixing is the linguistic phenomenon where bilingual speakers tend to switch between two or more languages in conversations. Recent work on code-mixing in computational settings has leveraged social media code mixed texts to train NLP models. For capturing the variety of code mixing in, and across corpus, Language ID (LID) tags based measures (CMI) have been proposed. Syntactical variety/patterns of code-mixing and their relationship vis-a-vis computational model‘s performance is under explored. In this work, we investigate a collection of English(en)-Hindi(hi) code-mixed datasets from a syntactic lens to propose, SyMCoM, an indicator of syntactic variety in code-mixed text, with intuitive theoretical bounds. We train SoTA en-hi PoS tagger, accuracy of 93.4%, to reliably compute PoS tags on a corpus, and demonstrate the utility of SyMCoM by applying it on various syntactical categories on a collection of datasets, and compare datasets using the measure.
@inproceedings{kodali-etal-2022-symcom,title={{S}y{MC}o{M} - Syntactic Measure of Code Mixing A Study Of {E}nglish-{H}indi Code-Mixing},author={Kodali, Prashant and Goel, Anmol and Choudhury, Monojit and Shrivastava, Manish and Kumaraguru, Ponnurangam},year={2022},booktitle={Findings of the Association for Computational Linguistics: ACL 2022},}
INLG
PreCogIIITH at HinglishEval : Leveraging Code-Mixing Metrics & Language Model Embeddings To Estimate Code-Mix Quality
Prashant
Kodali, Tanmay
Sachan, Akshay
Goindani, Anmol
Goel, Naman
Ahuja, Manish
Shrivastava, and Ponnurangam
Kumaraguru
In Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, 2022
Code-Mixing is a phenomenon of mixing two or more languages in a speech event and is prevalent in multilingual societies. Given the low-resource nature of Code-Mixing, machine generation of code-mixed text is a prevalent approach for data augmentation. However, evaluating the quality of such machine gen- erated code-mixed text is an open problem. In our submission to HinglishEval, a shared- task collocated with INLG2022, we attempt to build models factors that impact the quality of synthetically generated code-mix text by pre- dicting ratings for code-mix quality. Hingli- shEval Shared Task consists of two sub-tasks - a) Quality rating prediction); b) Disagree- ment prediction. We leverage popular code- mixed metrics and embeddings of multilin- gual large language models (MLLMs) as fea- tures, and train task specific MLP regression models. Our approach could not beat the baseline results. However, for Subtask-A our team ranked a close second on F-1 and Co- hen‘s Kappa Score measures and first for Mean Squared Error measure. For Subtask-B our ap- proach ranked third for F1 score, and first for Mean Squared Error measure. Code of our submission can be accessed here.
@inproceedings{kodali-etal-2022-precogiiith,title={{P}re{C}og{IIITH} at {H}inglish{E}val : Leveraging Code-Mixing Metrics {\&} Language Model Embeddings To Estimate Code-Mix Quality},author={Kodali, Prashant and Sachan, Tanmay and Goindani, Akshay and Goel, Anmol and Ahuja, Naman and Shrivastava, Manish and Kumaraguru, Ponnurangam},year={2022},booktitle={Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges},}
LREC
HashSet - A Dataset For Hashtag Segmentation
Prashant
Kodali, Akshala
Bhatnagar, Naman
Ahuja, Manish
Shrivastava, and Ponnurangam
Kumaraguru
In Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Hashtag segmentation is the task of breaking a hashtag into its constituent tokens. Hashtags often encode the essence of user-generated posts, along with information like topic and sentiment, which are useful in downstream tasks. Hashtags prioritize brevity and are written in unique ways - transliterating and mixing languages, spelling variations, creative named entities. Benchmark datasets used for the hashtag segmentation task - STAN, BOUN - are small and extracted from a single set of tweets. However, datasets should reflect the variations in writing styles of hashtags and account for domain and language specificity, failing which the results will misrepresent model performance. We argue that model performance should be assessed on a wider variety of hashtags, and datasets should be carefully curated. To this end, we propose HashSet, a dataset comprising of: a) 1.9k manually annotated dataset; b) 3.3M loosely supervised dataset. HashSet dataset is sampled from a different set of tweets when compared to existing datasets and provides an alternate distribution of hashtags to build and validate hashtag segmentation models. We analyze the performance of SOTA models for Hashtag Segmentation, and show that the proposed dataset provides an alternate set of hashtags to train and assess models.
@inproceedings{kodali-etal-2022-hashset,title={{H}ash{S}et - A Dataset For Hashtag Segmentation},author={Kodali, Prashant and Bhatnagar, Akshala and Ahuja, Naman and Shrivastava, Manish and Kumaraguru, Ponnurangam},year={2022},booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference},}
ACM HT
Erasing Labor with Labor: Dark Patterns and Lockstep Behaviors on Google Play
Ashwin
Singh, Arvindh
Arun, Pulak
Malhotra, Pooja
Desur, Ayushi
Jain, Duen Horng
Chau, and Ponnurangam
Kumaraguru
In Proceedings of the 33rd ACM Conference on Hypertext and Social Media, 2022
Google Play’s policy forbids the use of incentivized installs, ratings, and reviews to manipulate the placement of apps. However, there still exist apps that incentivize installs for other apps on the platform. To understand how install-incentivizing apps affect users, we examine their ecosystem through a socio-technical lens and perform a mixed-methods analysis of their reviews and permissions. Our dataset contains 319K reviews collected daily over five months from 60 such apps that cumulatively account for over 160.5M installs. We perform qualitative analysis of reviews to reveal various types of dark patterns that developers incorporate in install-incentivizing apps, highlighting their normative concerns at both user and platform levels. Permissions requested by these apps validate our discovery of dark patterns, with over 92% apps accessing sensitive user information. We find evidence of fraudulent reviews on install-incentivizing apps, following which we model them as an edge stream in a dynamic bipartite graph of apps and reviewers. Our proposed reconfiguration of a state-of-the-art microcluster anomaly detection algorithm yields promising preliminary results in detecting this fraud. We discover highly significant lockstep behaviors exhibited by reviews that aim to boost the overall rating of an install-incentivizing app. Upon evaluating the 50 most suspicious clusters of boosting reviews detected by the algorithm, we find (i) near-identical pairs of reviews across 94% (47 clusters), and (ii) over 35% (1,687 of 4,717 reviews) present in the same form near-identical pairs within their cluster. Finally, we conclude with a discussion on how fraud is intertwined with labor and poses a threat to the trust and transparency of Google Play.
@inproceedings{acmht-22,title={{E}rasing {L}abor with {L}abor: {D}ark {P}atterns and {L}ockstep {B}ehaviors on {G}oogle {P}lay},author={Singh, Ashwin and Arun, Arvindh and Malhotra, Pulak and Desur, Pooja and Jain, Ayushi and Chau, Duen Horng and Kumaraguru, Ponnurangam},year={2022},booktitle={Proceedings of the 33rd ACM Conference on Hypertext and Social Media},}
Towards adversarial evaluations for inexact machine unlearning
Shashwat
Goel, Ameya
Prabhu, Amartya
Sanyal, Ser-Nam
Lim, Philip
Torr, and Ponnurangam
Kumaraguru
Machine Learning models face increased concerns regarding the storage of personal user data and adverse impacts of corrupted data like backdoors or systematic bias. Machine Unlearning can address these by allowing post-hoc deletion of affected training data from a learned model. Achieving this task exactly is computationally expensive; consequently, recent works have proposed inexact unlearning algorithms to solve this approximately as well as evaluation methods to test the effectiveness of these algorithms. In this work, we first outline some necessary criteria for evaluation methods and show no existing evaluation satisfies them all. Then, we design a stronger black-box evaluation method called the Interclass Confusion (IC) test which adversarially manipulates data during training to detect the insufficiency of unlearning procedures. We also propose two analytically motivated baseline methods (EU-k and CF-k) which outperform several popular inexact unlearning methods. Overall, we demonstrate how adversarial evaluation strategies can help in analyzing various unlearning phenomena which can guide the development of stronger unlearning algorithms.
@inproceedings{goel2022towards,title={Towards adversarial evaluations for inexact machine unlearning},author={Goel, Shashwat and Prabhu, Ameya and Sanyal, Amartya and Lim, Ser-Nam and Torr, Philip and Kumaraguru, Ponnurangam},year={2022},booktitle={},}
EMNLP
An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy
Anmol
Goel, Charu
Sharma, and Ponnurangam
Kumaraguru
In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Polysemy is the phenomenon where a single word form possesses two or more related senses. It is an extremely ubiquitous part of natural language and analyzing it has sparked rich discussions in the linguistics, psychology and philosophy communities alike. With scarce attention paid to polysemy in computational linguistics, and even scarcer attention toward quantifying polysemy, in this paper, we propose a novel, unsupervised framework to compute and estimate polysemy scores for words in multiple languages. We infuse our proposed quantification with syntactic knowledge in the form of dependency structures. This informs the final polysemy scores of the lexicon motivated by recent linguistic findings that suggest there is an implicit relation between syntax and ambiguity/polysemy. We adopt a graph based approach by computing the discrete Ollivier Ricci curvature on a graph of the contextual nearest neighbors. We test our framework on curated datasets controlling for different sense distributions of words in 3 typologically diverse languages - English, French and Spanish. The effectiveness of our framework is demonstrated by significant correlations of our quantification with expert human annotated language resources like WordNet. We observe a 0.3 point increase in the correlation coefficient as compared to previous quantification studies in English. Our research leverages contextual language models and syntactic structures to empirically support the widely held theoretical linguistic notion that syntax is intricately linked to ambiguity/polysemy.
@inproceedings{goel-etal-2022-unsupervised,title={An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy},author={Goel, Anmol and Sharma, Charu and Kumaraguru, Ponnurangam},year={2022},booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},}
Thesis
Development of Stress Induction and Detection System to Study its Effect on Brain
@inproceedings{developmentofstressinductionanddetectionsystemtostudyitseffectonbrain,title={Development of Stress Induction and Detection System to Study its Effect on Brain},author={Phutela, N.},year={2022},booktitle={Ph.D. Thesis, BML Munjal University},}
Thesis
Leveraging AI to Understand Protests & Foster Secure Societies During Protest
@inproceedings{leveragingaitounderstandprotestsfostersecuresocietiesduringprotest,title={Leveraging AI to Understand Protests & Foster Secure Societies During Protest},author={Neha, K.},year={2022},booktitle={Ph.D. Comprehensive Report},}
Thesis
A Framework For Automatic Question Answering in Indian Languages
@inproceedings{aframeworkforautomaticquestionansweringinindianlanguages,title={A Framework For Automatic Question Answering in Indian Languages},author={Mishra, R.},year={2022},booktitle={Ph.D. Comprehensive Report},}
@inproceedings{deanonymizingpreservinganddemocratizingdataprivacyandownership,title={De-anonymizing, Preserving and Democratizing Data Privacy and Ownership},author={Gupta, S.},year={2022},booktitle={Ph.D. Comprehensive Report},}
2021
CALCS @ ACL
CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences
Code-mixed languages are very popular in multilingual societies around the world, yet the resources lag behind to enable robust systems on such languages. A major contributing factor is the informal nature of these languages which makes it difficult to collect code-mixed data. In this paper, we propose our system for Task 1 of CACLS 2021 to generate a machine translation system for English to Hinglish in a supervised setting. Translating in the given direction can help expand the set of resources for several tasks by translating valuable datasets from high resource languages. We propose to use mBART, a pre-trained multilingual sequence-to-sequence model, and fully utilize the pre-training of the model by transliterating the roman Hindi words in the code-mixed sentences to Devanagri script. We evaluate how expanding the input by concatenating Hindi translations of the English sentences improves mBART‘s performance. Our system gives a BLEU score of 12.22 on test set. Further, we perform a detailed error analysis of our proposed systems and explore the limitations of the provided dataset and metrics.
@inproceedings{gautam-etal-2021-comet,title={{C}o{M}e{T}: Towards Code-Mixed Translation Using Parallel Monolingual Sentences},author={Gautam, Devansh and Kodali, Prashant and Gupta, Kshitij and Goel, Anmol and Shrivastava, Manish and Kumaraguru, Ponnurangam},year={2021},booktitle={Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching},}
GermEval
Precog-LTRC-IIITH at GermEval 2021: Ensembling Pre-Trained Language Models with Feature Engineering
T. H.
Arjun, Arvindh
A., and Kumaraguru
Ponnurangam
In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments, 2021
We describe our participation in all the subtasks of the Germeval 2021 shared task on the identification of Toxic, Engaging, and Fact-Claiming Comments. Our system is an ensemble of state-of-the-art pre-trained models finetuned with carefully engineered features. We show that feature engineering and data augmentation can be helpful when the training data is sparse. We achieve an F1 score of 66.87, 68.93, and 73.91 in Toxic, Engaging, and Fact-Claiming comment identification subtasks.
@inproceedings{germeval-21,title={Precog-{LTRC}-{IIITH} at {G}erm{E}val 2021: Ensembling Pre-Trained Language Models with Feature Engineering},author={Arjun, T. H. and A., Arvindh and Ponnurangam, Kumaraguru},year={2021},booktitle={Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments},}
2020
Thesis
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and Application
@inproceedings{useridentitylinkagedatacollectiondatasetbiasesmethodcontrolandapplication,title={User Identity Linkage: Data Collection, DataSet Biases, Method, Control and Application},author={Kaushal, R.},year={2020},booktitle={Ph.D. Thesis, IIIT-Delhi},}
Thesis
Characterizing and Detecting livestreaming Chatbots
S.
Jain
In MS by Research in Computer Science and Engineering IIIT-Hyderabad, 2020
@inproceedings{characterizinganddetectinglivestreamingchatbots,title={Characterizing and Detecting livestreaming Chatbots},author={Jain, S.},year={2020},booktitle={MS by Research in Computer Science and Engineering IIIT-Hyderabad},}
2012
LBSN
We Know Where You Live: Privacy Characterization of Foursquare Behavior
In the last few years, the increasing interest in location-based services (LBS) has favored the introduction of geo-referenced information in various Web 2.0 applications, as well as the rise of location-based social networks (LBSN). Foursquare, one of the most popular LBSNs, gives incentives to users who visit (check in) specific places (venues) by means of, for instance, mayorships to frequent visitors. Moreover, users may leave tips at specific venues as well as mark previous tips as done in sign of agreement. Unlike check ins, which are shared only with friends, the lists of mayorships, tips and dones of a user are publicly available to everyone, thus raising concerns about disclosure of the user’s movement patterns and interests. We analyze how users explore these publicly available features, and their potential as sources of information leakage. Specifically, we characterize the use of mayorships, tips and dones in Foursquare based on a dataset with around 13 million users. We also analyze whether it is possible to easily infer the home city (state and country) of a user from these publicly available information. Our results indicate that one can easily infer the home city of around 78% of the analyzed users within 50 kilometers.
@inproceedings{tatiana:we-know-where-you-live:2012:yuqfj,title={We Know Where You Live: Privacy Characterization of Foursquare Behavior},author={Pontes, Tatiana and Vasconcelos, Marisa and Almeida, Jussara and Kumaraguru, Ponnurangam and Almeida, Virgilio},year={2012},booktitle={Proceedings of the 2012 ACM Conference on Ubiquitous Computing},}
Tring! Tring! - An Exploration and Analysis of Interactive Voice Response Systems
In developing regions like India, voice based telecommunication services are one of the most appropriate medium for information dissemination as they overcome prevalent low literacy rate. However, voice based Interactive Voice Response (IVR) systems are still not exploited to their full potential and are commonly considered as frustrating to use. We did a real world experiment to investigate the usability issues of a voice based system. In this paper, we report analysis of our experimental IVR and interface difficulties as experienced by the user. We also highlight the user behavior towards accessing critical and non-critical information over multiple information media vis-a-vis IVR, web and talking to a human on the phone. The findings suggests that an IVR which can adapt its behavior will prove to be more efficient and provide a better user experience. We believe that our results can be used for efficient development of next-generation adaptable IVR systems.
@inproceedings{asthana:tring-tring---an-explorat:2012:nrtys,title={Tring! Tring! - An Exploration and Analysis of Interactive Voice Response Systems},author={Asthana, Siddharth and Singh, Pushpendra and Kumaraguru, Ponnurangam and Singh, Amarjeet and Naik, Vinayak},year={2012},booktitle={},}
PSOSM
Credibility Ranking of Tweets during High Impact Events
Aditi
Gupta, and Ponnurangam
Kumaraguru
In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media, 2012
Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. Not all content posted on Twitter is trustworthy or useful in providing information about the event. In this paper, we analyzed the credibility of information in tweets corresponding to fourteen high impact news events of 2011 around the globe. From the data we analyzed, on average 30% of total tweets posted about an event contained situational information about the event while 14% was spam. Only 17% of the total tweets posted about the event contained situational awareness information that was credible. Using regression analysis, we identified the important content and sourced based features, which can predict the credibility of information in a tweet. Prominent content based features were number of unique characters, swear words, pronouns, and emoticons in a tweet, and user based features like the number of followers and length of username. We adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. The performance of our ranking algorithm significantly enhanced when we applied re-ranking strategy. Results show that extraction of credible information from Twitter can be automated with high confidence.
@inproceedings{rengamani:the-unique-identification-num:2010,title={Credibility Ranking of Tweets during High Impact Events},author={Gupta, Aditi and Kumaraguru, Ponnurangam},year={2012},booktitle={Proceedings of the 1st Workshop on Privacy and Security in Online Social Media},}
2011
HotMobile
User Controllable Security and Privacy for Mobile Mashups
A new paradigm in the domain of mobile applications is ’mobile mashups’, where Web content rendered on a mobile browser is amalgamated with data and features available on the device, such as user location, calendar information and camera. Although a number of frameworks exist that enable creation and execution of mobile mashups, they fail to address a very important issue of handling security and privacy considerations of a mobile user. In this paper, we characterize the nature of access control required for utilizing device features in a mashup setting; design a security and privacy middleware based on the well known XACML policy language; and describe how the middleware enables a user to easily control usage of device features. Implementation-wise, we realize our middleware on Android platform (but easily generalizable to other platforms), integrate it with an existing mashup framework, and demonstrate its utility through an e-commerce mobile mashup.
@inproceedings{adappa:user-controllable-securit:2011:kxyqv,title={{User Controllable Security and Privacy for Mobile Mashups}},author={Adappa, Shruthi and Agarwal, Vikas and Goyal, Sunil and Kumaraguru, Ponnurangam and Mittal, Sumit},year={2011},booktitle={Proceedings of the 12th Workshop on Mobile Computing Systems and Applications, Hotmobile 2011},}
IEEE SOLI
Enhancing the Rural Self Help Group – Bank Linkage Program
Empowerment of Self Help Groups (SHGs) is a dominating aspect as the micro-finance industry ushers into an era of maturity. Today SHGs are widely recognized as the hubs for information dissemination within villages and entry points for financial institutions as well as consumer goods organizations, though less has been done to deal with this highly illiterate population in terms of upgrading their skill sets or making them competent enough to soak the deluge of knowledge intensive programs aligned for them. In this paper, we observe that mobile penetration, the ease with which rural population uses the voice interface, and acceptability of mobile related technologies, all bring us to the confluence of mobility and innovative interaction technologies that can help in designing a system for the next billion population. We propose a system that uses voice as a medium to percolate knowledge through the thick layers of illiteracy, thereby serving as an effective mechanism to bring about a paradigm shift in the way SHGs are formed, operate and interact with the Micro-finance Institution (MFI). This system enables low cost financial services to be comprehended and adopted by the SHGs while empowering them to raise concerns and undertake active participation. This kind of empowerment of SHGs is unseen till date and can lead to, especially in case of women, better representation in elections of local panchayats, dowry upliftment and other social advancements, not understating the success of MFIs. Our system is designed and realized using IBM’s Spoken Web technology that employs an easy-to-use voice interface to create dynamic content in local vernacular language, based on the concept of ’Voice Sites’, interconnected by ’Voice Links’.
@inproceedings{agarwal:enhancing-the-rural-self-:2011:yuqfj,title={Enhancing the Rural Self Help Group -- Bank Linkage Program},author={Agarwal, Vikas and Desai, Vikram and Kapoor, Shalini and Kumaraguru, Ponnurangam and Mittal, Sumit},year={2011},booktitle={Published in 2011 Annual SRII Global Conference},}
CEAS
Phi.sh/$oCiaL: The Phishing Landscape through Short URLs
Sidharth
Chhabra, Anupama
Aggarwal, Fabricio
Benevenuto, and Ponnurangam
Kumaraguru
In The 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, CEAS 2011, 2011
Size, accessibility, and rate of growth of Online Social Media (OSM) has attracted cyber crimes through them. One form of cyber crime that has been increasing steadily is phishing, where the goal (for the phishers) is to steal personal information from users which can be used for fraudulent purposes. Although the research community and industry has been developing techniques to identify phishing attacks through emails and instant messaging (IM), there is very little research done, that provides a deeper understanding of phishing in online social media. Due to constraints of limited text space in social systems like Twitter, phishers have begun to use URL shortener services. In this study, we provide an overview of phishing attacks for this new scenario. One of our main conclusions is that phishers are using URL shorteners not only for reducing space but also to hide their identity. We observe that social media websites like Facebook, Habbo, Orkut are competing with e-commerce services like PayPal, eBay in terms of traffic and focus of phishers. Orkut, Habbo, and Facebook are amongst the top 5 brands targeted by phishers. We study the referrals from Twitter to understand the evolving phishing strategy. A staggering 89% of references from Twitter (users) are inorganic accounts which are sparsely connected amongst themselves, but have large number of followers and followees. We observe that most of the phishing tweets spread by extensive use of attractive words and multiple hashtags. To the best of our knowledge, this is the first study to connect the phishing landscape using blacklisted phishing URLs from PhishTank, URL statistics from bit.ly and cues from Twitter to track the impact of phishing in online social media.
@inproceedings{chhabra:phi.sh/ocial:-the-phishin:2011:yuqfj,title={{Phi.sh/\$oCiaL: The Phishing Landscape through Short URLs}},author={Chhabra, Sidharth and Aggarwal, Anupama and Benevenuto, Fabricio and Kumaraguru, Ponnurangam},year={2011},booktitle={The 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, CEAS 2011},}
IIWeb
Integrating Linked Open Data with Unstructured Text for Intelligence Gathering Tasks
We present techniques for uncovering links between terror incidents, organizations, and people involved with these incidents. Our methods involve performing shallow NLP tasks to extract entities of interest from documents and using linguistic pattern matching and filtering techniques to assign specific relations to the entities discovered. We also gather more information about these entities from the Linked Open Data Cloud, and further allow human analysts to add intelligent inference rules appropriate to the domain. All this information is integrated in a knowledge base in the form of a graph that maintains the semantics between different types of nodes involved in the graph. This knowledge base can then be queried by the analysts to create actionable intelligence.
@inproceedings{gupta:twitter-credibility-ranki:2011:yuqfj,title={Integrating Linked Open Data with Unstructured Text for Intelligence Gathering Tasks},author={Gupta, Archit and Viswanathan, Krishnamurthy Koduvayur and Joshi, Anupam and Finin, Timothy and Kumaraguru, Ponnurangam},year={2011},booktitle={Proceedings of the 8th International Workshop on Information Integration on the Web},}
PSOSM
\@Twitter Credibility Ranking of Tweets on Events #breakingnews
Aditi
Gupta, and Ponnurangam
Kumaraguru
In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media, 2011
Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. Not all content posted on Twitter is trustworthy or useful in providing information about the event. In this paper, we analyzed the credibility of information in tweets corresponding to fourteen high impact news events of 2011 around the globe. From the data we analyzed, on average 30% of total tweets posted about an event contained situational information about the event while 14% was spam. Only 17% of the total tweets posted about the event contained situational awareness information that was credible. Using regression analysis, we identified the important content and sourced based features, which can predict the credibility of information in a tweet. Prominent content based features were number of unique characters, swear words, pronouns, and emoticons in a tweet, and user based features like the number of followers and length of username. We adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. The performance of our ranking algorithm significantly enhanced when we applied re-ranking strategy. Results show that extraction of credible information from Twitter can be automated with high confidence.
@inproceedings{gupta:twitter-explodes-with-act:2011:yuqfj,title={{\@Twitter Credibility Ranking of Tweets on Events \#breakingnews}},author={Gupta, Aditi and Kumaraguru, Ponnurangam},year={2011},booktitle={Proceedings of the 1st Workshop on Privacy and Security in Online Social Media},}
Twitter Explodes with Activity in Mumbai Blasts! A Lifeline or an Unmonitored Daemon in the Lurking?
Online social media has become an integral part of every Internet users’ life. It has given common people a platform and forum to share information, post their opinions and promote campaigns. The threat of exploitation of social media like Facebook, Twitter, etc. by malicious entities, becomes crucial during a crisis situation, like bomb blasts or natural calamities such as earthquakes and floods. In this report, we attempt to characterize and extract patterns of activity of general users on Twitter during a crisis situation. This is the first attempt to study an India-centric crisis event such as the triple bomb blasts in Mumbai (India), using online social media. In this research, we perform content and activity analysis of content posted on Twitter after the bomb blasts. Through our analysis, we conclude, that the number of URLs and @-mentions in tweets increase during the time of the crisis in comparison to what researchers have exhibited for normal circumstances. In addition to the above, we empirically show that the number of tweets or updates by authority users (those with large number of followers) are very less, i.e. majority of content generated on Twitter during the crisis comes from non authority users. In the end, we discuss certain case scenarios during the Mumbai blasts, where rumors were spread through the network of Twitter.
@inproceedings{ion:home-is-safer-than-the-cl:2011:nrtys,title={{Twitter Explodes with Activity in Mumbai Blasts! A Lifeline or an Unmonitored Daemon in the Lurking?}},author={Gupta, Aditi and Kumaraguru, Ponnurangam},year={2011},booktitle={},}
SOUPS ’11
Home is Safer than the Cloud! Privacy Concerns for Consumer Cloud Storage
Iulia
Ion, Niharika
Sachdeva, Ponnurangam
Kumaraguru, and Srdjan
Capkun
In Symposium on Usable Privacy and Security (SOUPS), 2011
Several studies ranked security and privacy to be major areas of concern and impediments of cloud adoption for companies, but none have looked into end-users’ attitudes and practices. Not much is known about consumers’ privacy beliefs and expectations for cloud storage, such as web-mail, document and photo sharing platforms, or about users’ awareness of contractual terms and conditions. We conducted 36 in-depth interviews in Switzerland and India (two countries with different privacy perceptions and expectations); and followed up with an online survey with 402 participants in both countries. We study users’ privacy attitudes and beliefs regarding their use of cloud storage systems. Our results show that privacy requirements for consumer cloud storage differ from those of companies. Users are less concerned about some issues, such as guaranteed deletion of data, country of storage and storage outsourcing, but are uncertain about using cloud storage. Our results further show that end-users consider the Internet intrinsically insecure and prefer local storage for sensitive data over cloud storage. However, users desire better security and are ready to pay for services that provide strong privacy guarantees. Participants had misconceptions about the rights and guarantees their cloud storage providers offers. For example, users believed that their provider is liable in case of data loss, does not have the right to view and modify user data, and cannot disable user accounts. Finally, our results show that cultural differences greatly influence user attitudes and beliefs, such as their willingness to store sensitive data in the cloud and their acceptance that law enforcement agencies monitor user accounts. We believe that these observations can help in improving users privacy in cloud storage systems.
@inproceedings{jain:cross-pollination-of-info:2011:nrtys,title={Home is Safer than the Cloud! Privacy Concerns for Consumer Cloud Storage},author={Ion, Iulia and Sachdeva, Niharika and Kumaraguru, Ponnurangam and Capkun, Srdjan},year={2011},booktitle={Symposium on Usable Privacy and Security (SOUPS)},}
PASSAT
Cross-Pollination of Information in Online Social Media: A Case Study on Popular Social Networks
Paridhi
Jain, Tiago
Rodrigues, Gabriel
Magno, Ponnurangam
Kumaraguru, and Virgilo
Almeida
In published in SocialCom PASSAT 2011 as a six page short paper, 2011
Owing to the popularity of Online Social Media (OSM), Internet users share a lot of information (including personal) on and across OSM services every day. For example, it is common to find a YouTube video embedded in a blog post with an option to share the link on Facebook. Users recommend, comment, and forward information they receive from friends, contributing in spreading the information in and across OSM services. We term this information diffusion process from one OSM service to another as Cross-Pollination, and the network formed by users who participate in Cross-Pollination and content produced in the network as \emphCross-Pollinated network. Research has been done about information diffusion within one OSM service, but little is known about Cross-Pollination. In this paper, we aim at filling this gap by studying how information (video, photo, location) from three popular OSM services (YouTube, Flickr and Foursquare) diffuses on Twitter, the most popular microblogging service. Our results show that Cross-Pollinated networks follow temporal and topological characteristics of the diffusion OSM (Twitter in our study). Furthermore, popularity of information on source OSM (YouTube, Flickr and Foursquare) does not imply its popularity on Twitter. Our results also show that Cross-Pollination helps Twitter in terms of traffic generation and user involvement, but only a small fraction of videos and photos gain a significant number of views from Twitter. We believe this is the first research work which explicitly characterizes the diffusion of information across different OSM services.
@inproceedings{khot:marasim:-a-novel-jigsaw-b:2011:nrtys,title={Cross-Pollination of Information in Online Social Media: A Case Study on Popular Social Networks},author={Jain, Paridhi and Rodrigues, Tiago and Magno, Gabriel and Kumaraguru, Ponnurangam and Almeida, Virgilo},year={2011},booktitle={published in SocialCom PASSAT 2011 as a six page short paper},}
CHI ’11
Marasim: A Novel Jigsaw Based Authentication Scheme using Tagging
Rohit
Khot, Srinathan
Kannan, and Ponnurangam
Kumaraguru
In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2011
In this paper we propose and evaluate Marasim, a novel Jigsaw based graphical authentication mechanism using tagging. Marasim is aimed at achieving the security of random images with the memorability of personal images. Our scheme relies on the human ability to remember a personal image and later recognize the alternate visual representations (images) of the concepts occurred in the image. These concepts are retrieved from the tags assigned to the image. We illustrate how a Jigsaw based approach helps to create a portfolio of system-chosen random images to be used for authentication. The paper describes the complete design of Marasim along with the empirical studies of Marasim that provide evidences of increased memorability. Results show that 93% of all participants succeeded in the authentication tests using Marasim after three months while 71% succeeded in authentication tests using Marasim after nine months. Our findings indicate that Marasim has potential applications, especially where text input is hard (e.g., PDAs or ATMs), or in situations where passwords are infrequently used (e.g., web site passwords).
@inproceedings{kuldeep-yadav:smsassassin-:-crowdsourci:2011:yuqfj,title={Marasim: A Novel Jigsaw Based Authentication Scheme using Tagging},author={Khot, Rohit and Kannan, Srinathan and Kumaraguru, Ponnurangam},year={2011},booktitle={Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},}
HotMobile ’11
SMSAssassin : Crowdsourcing Driven Mobile-based System for SMS Spam Filtering
Due to increase in use of Short Message Service (SMS) over mobile phones in developing countries, there has been a burst of spam SMSes. Content-based machine learning approaches were effective in filtering email spams. Researchers have used topical and stylistic features of the SMS to classify spam and ham. SMS spam filtering can be largely influenced by the presence of regional words, abbreviations and idioms. We have tested the feasibility of applying Bayesian learning and Support Vector Machine(SVM) based machine learning techniques which were reported to be most effective in email spam filtering on a India centric dataset. In our ongoing research, as an exploratory step, we have developed a mobile-based system SMSAssassin that can filter SMS spam messages based on bayesian learning and sender blacklisting mechanism. Since the spam SMS keywords and patterns keep on changing, SMSAssassin uses crowd sourcing to keep itself updated. Using a dataset that we are collecting from users in the real-world, we evaluated our approaches and found some interesting results.
@inproceedings{kumaraguru:a-survey-of-privacy-polic:2007:lrfkq,title={SMSAssassin : Crowdsourcing Driven Mobile-based System for SMS Spam Filtering},author={Yadav, Kuldeep and Kumaraguru, Ponnurangam and Goyal, Atul and Gupta, Ashish and Naik, Vinayak},year={2011},booktitle={Proceedings of the 12th Workshop on Mobile Computing Systems and Applications},}
2010
SafeConfig
Cue : A Framework for Generating Meaningful Feedback in XACML
Sunil Kumar
Ghai, Prateek
Nigam, and Ponnurangam
Kumaraguru
In Proceedings of the 3rd ACM workshop on Assurable and usable security configuration, 2010
With a number of access rules at play along with contexts in which they may or may not apply, it is not always obvious to the legitimate user what caused an authorization server to deny a request, neither is it possible for the administrator to specify a complete fail proof policy. It then becomes the responsibility of the system to act in a user friendly manner by providing feedback suggesting the requester about possible alternatives. The system should also cover any unhandled request that it may encounter due to an incomplete system policy. At the same time, it is essential for feedback to not reveal the entire policy to any user. In this paper we propose a framework Cue, for generating feedback in XACML using logic programming in Prolog. Feedback content is protected by the use of meta policy which itself is specified in XACML. We first translate XACML policies into logic based functors. Second, we execute a query using parameters in the denied XACML request, to identify conditions that failed. Third, the failed condition is notified as feedback if a meta policy allows the system to reveal it. Cue is capable of generating appropriate feedback while ensuring that a desired degree of confidentiality is maintained.
@inproceedings{ghai:cue-:-a-framework-for-gen:2010:kxyqv,title={{Cue : A Framework for Generating Meaningful Feedback in XACML}},author={Ghai, Sunil Kumar and Nigam, Prateek and Kumaraguru, Ponnurangam},year={2010},booktitle={Proceedings of the 3rd ACM workshop on Assurable and usable security configuration},}
ICEB
The Unique Identification Number Project: Challenges and Recommendations
Haricharan
Rengamani, Ponnurangam
Kumaraguru, Rajarishi
Chakraborty, and H. Raghav
Rao
In Third International Conference on Ethics and Policy of Biometrics, 2010
This paper elucidates the social, ethical, cultural, technical, and legal implications / challenges around the implementation of a biometric based unique identification (UID) number project. The Indian government has undertaken a huge effort to issue UID numbers to its residents. Apart from possible challenges that are expected in the implementation of UID, the paper also draws parallels from Social Security Number system in the US. We discuss the setbacks of using the Social Security Number as a unique identifier and how to avoid them with the system being proposed in India. We discuss the various biometric techniques used and a few recommendations associated with the use of biometrics.
@inproceedings{gupta:integrating-linked-open-d:2011:yuqfj,title={The Unique Identification Number Project: Challenges and Recommendations},author={Rengamani, Haricharan and Kumaraguru, Ponnurangam and Chakraborty, Rajarishi and Rao, H. Raghav},year={2010},booktitle={Third International Conference on Ethics and Policy of Biometrics},}
AIRS ’10
Mining YouTube to Discover Extremist Videos, Users and Hidden Communities
Ponnurangam
Kumaraguru, and Ashish
Sureka
In Proceedings of Asia Information Retrieval Societies Conference, 2010, 2010
We describe a semi-automated system to assist law enforcement and intelligence agencies dealing with cyber-crime related to promotion of hate and radicalization on the Internet. The focus of this work is on mining YouTube to discover hate videos, users and virtual hidden communities. Finding precise information on YouTube is a challenging task because of the huge size of the YouTube repository and a large subscriber base. We present a solution based on data mining and social network analysis (using a variety of relationships such as friends, subscriptions, favorites and related videos) to aid an analyst in discovering insightful and actionable information. Furthermore, we performed a systematic study of the features and properties of the data and hidden social networks which has implications in understanding extremism on Internet. We take a case study based approach and perform empirical validation of the proposed hypothesis. Our approach succeeded in finding hate videos which were validated manually.
@inproceedings{kumaraguru:anti-phishing-landing-page:2009:yuqfj,title={Mining YouTube to Discover Extremist Videos, Users and Hidden Communities},author={Kumaraguru, Ponnurangam and Sureka, Ashish},year={2010},booktitle={Proceedings of Asia Information Retrieval Societies Conference, 2010},}
2009
IBM
Policy framework for security and privacy management
J.
Karat, C.-M.
Karat, E.
Bertino, N.
Li, Q.
Ni, C.
Brodie, J.
Lobo, S. B.
Calo, L. F.
Cranor, P.
Kumaraguru, and R. W.
Reeder
In Published in IBM Journal of Research and Development, 2009
Policies that address security and privacy are pervasive parts of both technical and social systems, and technology that enables both organizations and individuals to create and manage such policies is a critical need in information technology (IT). This paper describes the notion of end-to-end policy management and advances a framework that can be useful in understanding the commonality in IT security and privacy policy management.
@inproceedings{Karat:2009:PFS:1850636.1850640,title={Policy framework for security and privacy management},author={Karat, J. and Karat, C.-M. and Bertino, E. and Li, N. and Ni, Q. and Brodie, C. and Lobo, J. and Calo, S. B. and Cranor, L. F. and Kumaraguru, P. and Reeder, R. W.},year={2009},booktitle={Published in IBM Journal of Research and Development},}
Anti-phishing landing page: Turning a 404 into a teachable moment for end users
Ponnurangam
Kumaraguru, Lorrie Faith
Cranor, and Laura
Mather
This paper describes the design and implementation of the Anti-Phishing Working Group (APWG) anti-phishing land-ing page, a web page with a succinct anti-phishing training message designed to be displayed in place of a phishing web-site that has been taken down. The landing page is currently being used by financial institutions, phish site take-down vendors, government organizations and online merchants. When would-be phishing victims try to visit a phishing web site that has been taken down, they are redirected to the landing page, hosted on the APWG website. In this paper, we discuss the iterative user-centered design process we used to develop the landing page content. We present the data we collected from the landing page log files from October 1, 2008 through March 31, 2009, during the first six months of the landing page program. Our analysis suggests that ap-proximately 70,000 Internet users have been educated by the landing page during this period. We identified 3,917 unique phishing URLs that had been redirected to the landing page. We found 81 URLs that appeared in our log files in email messages archived in the APWG phishing email repository. We present our analysis of the features of these emails.
@inproceedings{kumaraguru:getting-users-to-pay-atte:2007:yuqfj,title={Anti-phishing landing page: Turning a 404 into a teachable moment for end users},author={Kumaraguru, Ponnurangam and Cranor, Lorrie Faith and Mather, Laura},year={2009},booktitle={},}
SOUPS ’09
School of phish: a real-world evaluation of anti-phishing training
Ponnurangam
Kumaraguru, Justin
Cranshaw, Alessandro
Acquisti, Lorrie
Cranor, Jason
Hong, Mary Ann
Blair, and Theodore
Pham
In Proceedings of the 5th Symposium on Usable Privacy and Security, 2009
PhishGuru is an embedded training system that teaches users to avoid falling for phishing attacks by delivering a training message when the user clicks on the URL in a simulated phishing email. In previous lab and real-world experiments, we validated the effectiveness of this approach. Here, we extend our previous work with a 515-participant, real-world study in which we focus on long-term retention and the effect of two training messages. We also investigate demographic factors that influence training and general phishing susceptibility. Results of this study show that (1) users trained with PhishGuru retain knowledge even after 28 days; (2) adding a second training message to reinforce the original training decreases the likelihood of people giving information to phishing websites; and (3) training does not decrease users’ willingness to click on links in legitimate messages. We found no significant difference between males and females in the tendency to fall for phishing emails both before and after the training. We found that participants in the 18–25 age group were consistently more vulnerable to phishing attacks on all days of the study than older participants. Finally, our exit survey results indicate that most participants enjoyed receiving training during their normal use of email.
@inproceedings{kumaraguru:lessons-from-a-real-world:2008:lrfkq,title={School of phish: a real-world evaluation of anti-phishing training},author={Kumaraguru, Ponnurangam and Cranshaw, Justin and Acquisti, Alessandro and Cranor, Lorrie and Hong, Jason and Blair, Mary Ann and Pham, Theodore},year={2009},booktitle={Proceedings of the 5th Symposium on Usable Privacy and Security},}
Thesis
PhishGuru: A System for Educating Users about Semantic Attacks
The goal of this thesis is to show that computer users trained with an embedded training system - one grounded in the principles of learning science - are able to make more accurate online trust decisions than users who read traditional security training materials, which are distributed via email or posted online. To achieve this goal, we focus on "phishing," a type of semantic attack. We have developed a system called "PhishGuru" based on embedded training methodology and learning science principles. Embedded training is a methodology in which training materials are integrated into the primary tasks users perform in their day-to-day lives. In contrast to existing training methodologies, the PhishGuru shows training materials to users through emails at the moment ("teachable moment") users actually fall for phishing attacks.
@inproceedings{kumaraguru:privacy-in-india:-attitud:2030:lrfkq,title={PhishGuru: A System for Educating Users about Semantic Attacks},author={Kumaraguru, Ponnurangam},year={2009},booktitle={Research Thesis},}
2008
IDMAN
A Contextual Method for Evaluating Privacy Preferences
Caroline
Sheedy, and Ponnurangam
Kumaraguru
In Policies and Research in Identity Management (IDMAN), 2008
Identity management is a relevant issue at a national and international level. Any approach to identity management is incomplete unless privacy is also a consideration. Existing research on evaluating an individual’s privacy preferences has shown discrepancies in the stated standards required by users, and the corresponding observed behaviour. We take a contextual approach to surveying privacy, using the framework proposed by contextual integrity, with the aim of further understanding users self reported views on privacy at a national level.
@inproceedings{gupta:credibility-ranking-of-tw:2012:yuqfj,title={A Contextual Method for Evaluating Privacy Preferences},author={Sheedy, Caroline and Kumaraguru, Ponnurangam},year={2008},booktitle={Policies and Research in Identity Management (IDMAN)},}
IEEE
Lessons From a Real World Evaluation of Anti-Phishing Training
Ponnurangam
Kumaraguru, Steve
Sheng, Alessandro
Acquisti, Lorrie Faith
Cranor, and Jason
Hong
In Published in 2008 eCrime Researchers Summit, 2008
Prior laboratory studies have shown that PhishGuru, an embedded training system, is an effective way to teach users to identify phishing scams. PhishGuru users are sent simulated phishing attacks and trained after they fall for the attacks. In this current study, we extend the PhishGuru methodology to train users about spear phishing and test it in a real world setting with employees of a Portuguese company. Our results demonstrate that the findings of PhishGuru laboratory studies do indeed hold up in a real world deployment. Specifically, the results from the field study showed that a large percentage of people who clicked on links in simulated emails proceeded to give some form of personal information to fake phishing websites, and that participants who received PhishGuru training were significantly less likely to fall for subsequent simulated phishing attacks one week later. This paper also presents some additional new findings. First, people trained with spear phishing training material did not make better decisions in identifying spear phishing emails compared to people trained with generic training material. Second, we observed that PhishGuru training could be effective in training other people in the organization who did not receive training messages directly from the system. Third, we also observed that employees in technical jobs were not different from employees with non-technical jobs in identifying phishing emails before and after the training. We conclude with some lessons that we learned in conducting the real world study.
@inproceedings{kumaraguru:phishguru:-a-system-for-e:2009:rcrwd,title={Lessons From a Real World Evaluation of Anti-Phishing Training},author={Kumaraguru, Ponnurangam and Sheng, Steve and Acquisti, Alessandro and Cranor, Lorrie Faith and Hong, Jason},year={2008},booktitle={Published in 2008 eCrime Researchers Summit},}
2007
SOUPS
A Survey of Privacy Policy Languages
Ponnurangam
Kumaraguru, Lorrie
Cranor, Jorge
Lobo, and Seraphin
Calo
In Symposium on Usable Privacy and Security (SOUPS), 2007
Most consumers are sensitive to privacy issues when conducting business online. Protecting information by enforcing security and privacy practices internally is a way for organizations to increase business by building trust with such consumers. They can express their privacy practices as policies in a human readable format to help consumers make informed decisions. Many privacy languages are available for representing policies, but they tend to use formats convenient to their implementations, and there is no single framework or metric to analyze and evaluate the effectiveness of these languages. In this research, we are interested in succinctly summarizing the literature available on privacy policy languages; providing an account of the features, characteristics and requirements of the languages; and, describing a comprehensive framework for analysis. We expect our results to aid implementers in choosing an existing language and to provide guidelines for building languages in the future. We expect this research to be a starting point towards developing frameworks and metrics for analyzing privacy policy languages.
@inproceedings{kumaraguru:anti---terrorism-in-india:2010:lrfkq,title={A Survey of Privacy Policy Languages},author={Kumaraguru, Ponnurangam and Cranor, Lorrie and Lobo, Jorge and Calo, Seraphin},year={2007},booktitle={Symposium on Usable Privacy and Security (SOUPS)},}
eCrime
Getting Users to Pay Attention to Anti-Phishing Education: Evaluation of Retention and Transfer
Ponnurangam
Kumaraguru, Yong
Rhee, Steve
Sheng, Sharique
Hasan, Alessandro
Acquisti, Lorrie Faith
Cranor, and Jason
Hong
In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, 2007
Educational materials designed to teach users not to fall for phishing attacks are widely available but are often ignored by users. In this paper, we extend an embedded training methodology using learning science principles in which phishing education is made part of a primary task for users. The goal is to motivate users to pay attention to the training materials. In embedded training, users are sent simulated phishing attacks and trained after they fall for the attacks. Prior studies tested users immediately after training and demonstrated that embedded training improved users’ ability to identify phishing emails and websites. In the present study, we tested users to determine how well they retained knowledge gained through embedded training and how well they transferred this knowledge to identify other types of phishing emails. We also compared the effectiveness of the same training materials delivered via embedded training and delivered as regular email messages. In our experiments, we found that: (a) users learn more effectively when the training materials are presented after users fall for the attack (embedded) than when the same training materials are sent by email (non-embedded); (b) users retain and transfer more knowledge after embedded training than after non-embedded training; and (c) users with higher Cognitive Reflection Test (CRT) scores are more likely than users with lower CRT scores to click on the links in the phishing emails from companies with which they have no account.
@inproceedings{Kumaraguru:2009:SPR:1572532.1572536,title={Getting Users to Pay Attention to Anti-Phishing Education: Evaluation of Retention and Transfer},author={Kumaraguru, Ponnurangam and Rhee, Yong and Sheng, Steve and Hasan, Sharique and Acquisti, Alessandro and Cranor, Lorrie Faith and Hong, Jason},year={2007},booktitle={Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit},}
2005
PET ’05
Privacy in India: Attitudes and Awareness
Ponnurangam
Kumaraguru, and Lorrie
Cranor
In Proceedings of the 2005 Workshop on Privacy Enhancing Technologies (PET2005), 2005
In recent years, numerous surveys have been conducted to assess attitudes about privacy in the United States, Australia, Canada, and the European Union. Very little information has been published about privacy attitudes in India. As India is becoming a leader in business process outsourcing, increasing amounts of personal information from other countries is flowing into India. Questions have been raised about the ability of Indian companies to adequately protect this information. We conducted an exploratory. study to gain an initial understanding of attitudes about privacy among the Indian high tech workforce. We carried out a written survey and one-on-one inter-views to assess the level of awareness about privacy-related issues and concern about privacy among a sample of educated people in India. Our results demonstrate an overall lack of awareness of privacy issues and less concern about privacy in India than has been found in similar studies conducted in the United States.
@inproceedings{kumaraguru:protecting-people-from-ph:2007:lrfkq,title={{Privacy in India: Attitudes and Awareness}},author={Kumaraguru, Ponnurangam and Cranor, Lorrie},year={2005},booktitle={Proceedings of the 2005 Workshop on Privacy Enhancing Technologies (PET2005)},}