We address the development of explainable Question Answering (QA) systems for Indic languages, focusing on the unique challenges posed by resource scarcity and the complexities of multilingual processing. The research begins by categorizing QA systems based on context, domain, conversational requirements, and answer types, emphasizing the importance of text-based QA for cognitive development. A comprehensive literature review highlights advances in factoid and non-factoid QA, the rise of Transformer-based models, and the critical role of retrieval mechanisms for handling extended contexts. Our work also identifies significant gaps in resources for Indic languages, particularly for non-factoid QA, and underscores the necessity for efficient, explainable, and retrieval-augmented models. To address the lack of structured knowledge extraction tools for low-resource languages, the thesis introduces IndIE, an Open Information Extraction (OIE) system designed for Hindi. IndIE employs a multilingual pretrained transformer, fine-tuned on chunk-annotated data from English and five Indic languages, to generate triples from unstructured sentences. In sequence labeling tasks (like chunking), it was found that the mean of subword token embeddings is more beneficial than other approaches. The system leverages chunk tagging and Merged-Phrase Dependency Trees, achieving a 0.51 F1-score on a benchmark of 112 Hindi sentences and producing more granular triples than existing multilingual approaches. The underlying methodology demonstrates potential for extension to Urdu, Tamil, and Telugu, given the generalizability of the chunker and the language-agnostic nature of the triple extraction rules. Recognizing the challenge of resolving references to the same entity across text, the thesis present TransMuCoRes, a multilingual coreference resolution dataset spanning 31 South Asian languages. Using automated translation and word alignment, TransMuCoRes fills a critical resource gap for coreference tasks in these languages. Two coreference models, trained on a combination of TransMuCoRes and manually annotated Hindi data, achieve LEA F1 and CoNLL F1 scores of 64 and 68, respectively, on a Hindi test set. The work also critiques current evaluation metrics, advocating for improved measures to handle split antecedents. Building on these foundational tools, the thesis introduces MuNfQuAD, a multilingual non-factoid QA dataset comprising over 578K question-answer pairs across 38 languages, including numerous low-resource languages. Questions are derived from interrogative sub-headings in BBC news articles, with corresponding paragraphs serving as silver-standard answers. Manual annotation of 790 pairs reveals that 98% of questions are answerable using the provided context. An Answer Paragraph Selection (APS) model, fine-tuned on this dataset, achieves 80% accuracy and 72% macro F1 on the test set, and 72% accuracy and 66% macro F1 on the golden set, outperforming baseline methods and demonstrating effective context reduction. The thesis further investigates explainability in QA and related tasks. Through experiments on the HateXplain benchmark, it compares three post-hoc interpretability methods for transformer-based encoders in hate speech detection. Notably, Layerwise Relevance Propagation (LRP) underperforms, sometimes even less informative than random rationale generation, due to its tendency to focus on initial tokens. This finding highlights the limitations of LRP for explaining fine-tuned transformer predictions. To enhance QA performance for long contexts, especially in Indic languages, the thesis explores various context-shortening strategies based on OIE, coreference resolution, and APS. Experiments with three popular Large Language Models (LLMs) on Hindi, Tamil, Telugu, and Urdu show that these techniques improve semantic scores by an average of 4% and token-level scores by 47% without fine-tuning, and by 2% with fine-tuning, while also reducing computational demands. Explainability analyses using LIME and SHAP indicate that APS-selected paragraphs concentrate model attention on relevant tokens. However, the study notes persistent challenges for LLMs in non-factoid QA requiring reasoning, and finds that verbalizing OIE triples does not further enhance performance. As a retrospective epilogue of the thesis, we also present a Hindi chatbot for maternal and child health queries. Using a curated FAQ database and an ensemble of rule-based, embedding-based, and paraphrasing classifiers, the system covers 80% of user queries and retrieves at least one relevant answer in the top three suggestions for 70% of cases. Collectively, this work advances the state of explainable QA for Indic languages by developing novel resources, tools, and evaluation frameworks, and by demonstrating the effectiveness of context-shortening and interpretability techniques in low-resource, multilingual settings. Future work in QA systems for Indic languages includes expanding benchmarks like HindiBenchIE to other low-resource languages for standardized evaluation of triple extraction methods, thus advancing multilingual OIE. The release of TransMuCoRes checkpoints offers a baseline for multilingual coreference resolution research. Using APS models as reward models for LLM alignment may improve answer accuracy for complex queries. Additional directions involve deploying chatbots in real-world settings, refining OIE and coreference models, expanding multilingual QA datasets, and enhancing explainability. Evaluating systems on longer contexts and integrating advanced alignment strategies will foster robust, transparent QA frameworks for Indic languages.