|Adm. – Grad.||2008 – 2011|
|Dir.; Codir.||Stéphane Gagnon; Alain Charbonneau|
Impact of Integrating a Standard XBRL Ontology to Automated Text Classification: An Application to Financial News
Even though many methods, developed in the field of automatic text categorization, have achieved significant levels of precision when it comes to simple structure of texts (e.g. emails, summaries, etc.). Nevertheless, there remains much to do in the case of complex documents such as financial news and similar knowledge-based analyses. This complexity makes it more difficult to formalize and update a representative knowledge base, which directly influence text mining in the identification of common issues between the text and the components (by analysis of similarities and hierarchies) and there monitoring through time (e.g., Topic Detection and Tracking).
In this research, we propose to adopt, as a model for formal representation of knowledge, normalized ontology which has recently demonstrated an improvement in classification results. Among the research conducted in this area we include Wikipedia ontology that contains, in 2007, two million entries by itself , the multilingual classification based on ontology  and the integration of the ontology inside information retrieval tasks (especially in the grouping of texts and tasks of classification) . To validate our approach, experiments will be conducted using a commercial classifier IBM Classification Module (ICM, a module of IBM OmniFind). Our classification tests are performed on a specific subset of new financial sector from the Reuters Corpus Version 1 (RCV1) which, with its 810 000 news, is considered as the largest collection of news available.