Information and Knowledge Extraction

Theme leads

Philippe Langlais & Reihaneh Rabbany

Researchers involved

Yacine Benahmed, Jackie C.K. Cheung, Richard Khoury, Jian-Yun Nie, Siva Reddy, Amal Zouaq




  • Knowledge extraction from texts

  • Knowledge extraction from social media

  • Fake news detection and trustworthiness

  • Discourse analysis


Language is a common medium used to describe knowledge. In many cases, much of the knowledge we can find in a text is of general interest that can be used in many NLP tasks. For example, the knowledge that “Montreal is a city in Quebec” can help answer questions about cities in Quebec. Knowledge extraction aims to detect new pieces of knowledge from texts and to formulate it in a form usable in different applications. This can be done either in a specific domain or in open-domain. The extracted knowledge will complement the existing knowledge graphs such as Freebase, Yago, ConceptNet or UMLS in medicine. We plan to extract knowledge from various sources: general texts (e.g. webpages), specialized documents (e.g. in medicine) or user-generated contents on social media.

During knowledge extraction, it is important to detect the validity of a piece of information and knowledge to prevent from spreading false information and knowledge in inferences. Therefore, a special attention will be paid to the detection of misinformation and to measurement of trustworthiness of the knowledge.