Loading Events

« All Events

  • This event has passed.

Swapnil Hingmire: WikiLDA: Enhancing Interpretability of LDA Topics Using Wikipedia

October 20, 2023 @ 10:30 am - 12:00 pm

Please RSVP at this form. Zoom link is available upon RSVP.

Video now available.

Important goals of topic models like Latent Dirichlet Allocation (LDA) are to explore, summarize, and visualize unstructured document corpora. However, recent work has shown that some of these goals are best satisfied if a user is able to interpret and label LDA topics, thereby aligning them more closely to her expectations. 

We propose WikiLDA, an enhancement to LDA using Wikipedia-based Explicit Semantic Analysis (ESA). In WikiLDA, each document in a corpus is “sprinkled” with Wikipedia concepts most relevant to it, based on ESA. We then use generalized P\'{o}lya Urn (GPU) to incorporate word-word, word-concept, and concept-concept semantic relatedness into the generative process of LDA. While inferring topics on the sprinkled corpus, we give more weight to Wikipedia concepts to increase their likelihood of appearing as the most probable in a topic. As the most probable concepts from inferred topics can be referred to on Wikipedia, the topics are likely to become more interpretable and hence more usable to acquire domain knowledge from humans for various text mining tasks (e.g. eliciting topic labels for text classification). Empirical results show that a projection of documents by WikiLDA in a semantically enriched and more coherent topic space leads to improved performance in text classification-like tasks, especially in domains where the classes are hard to separate. We also discuss ongoing work on the evaluation of the interpretability and quality of LDA topics based on user studies and Large Language Models.

Swapnil completed a Ph.D. in Computer Science and Engineering at the Indian Institute of Technology Madras. My research was focused on reducing knowledge acquisition overhead in text classification. He is interested in building computational models of uncertainty and interactions among different knowledge sources involved in the human understanding of a text. He was a Scientist at TCS Research, India, and visited Yale University as a TATA Visiting Scholar. He was also a faculty member at the Data Science Department of the Indian Institute of Technology Palakkad.


October 20, 2023
10:30 am - 12:00 pm
Event Category:


Neil Ernst


ECS 660
View Venue Website