Adm. – Grad.2013 – 
Dir.; Codir.Véronique Nabelsi; Stéphane Gagnon

Development and implementation of an ontology for the automated semantic analysis of IT project risks 

Kwan, Franck-Olivier

Problem: This thesis demonstrates the feasibility of using semantic analysis as a risk management tool in IT projects. It assumes that if we can annotate all artifacts of a project, to indicate events related to various risks, we should be able in theory to use a semantic analysis tool to establish potential links between various events and risks, and thus identify the precursors to trace their occurrence. This kind of tool could therefore serve as a platform for preventative risk management in IT projects.

Theory: Our main contribution is the design, implementation-implement and validate a risk management process to exploit the artifacts of IT projects, treated with text analytics and detection of semantic structures, including outputs are integrated into an ontology or knowledge base related to the risks of such projects. It then uses the query and inference tools based on knowledge, so to recognize and classify the events as they are with or without the occurrence of potential risks. Finally, we interpret these results in comparison to a real risk audit, to determine the quality of the classification.

Data: Our experimental methodology is based on a real large-scale IT project carried out over a 7-year period in a large department. We analyze the artifacts related to the project documentation, Gantt schedules, system architecture documents and models, risk analysis reports and project audit reports. These artifacts are all integrated into the different features database servers (ie, SQL, NoSQL, RDF), allowing flexible representation while facilitating the indexing and querying via the standards of the semantic web.

Software: We use three groups of text analytics and semantic structures: (1) an ontology of risk that we develop based on subsets of the umbel and WordNet ontologies for describing the risks and main project components referred; (2) the ARDAKE software, or Adaptive Rules-Driven Architecture for Knowledge Extraction based on the platform Unstructured Information Management Architecture (UIMA), to formulate sets of annotation rules artifacts, and insert tags linked to different concepts our ontology; (3) software Protégé from Stanford University (USA) and ONTOP of Bozen-Bolzano University (Italy), for creating orders SPARQL to query all of our annotated artefacts, stored in different databases,

Methodology: Our validation process starts with the identification of ten high-risk events such as formally defined in a public audit of the referred project. For each risk, we develop annotation sets of rules in ARDAKE, to link the risk of events associated directly or indirectly, especially in a close period and precursors to the occurrence of the risk. These rules are executed to annotate all the artifacts in our databases, connecting each sentence, section, and relevant element to the events related to the risk. All artifacts are dated and have metadata about the author and the approval, we reconstruct the chain of events from artifacts and correlated to the hierarchy of the project team, which largely operates during the 7 years of implementation. Once completed annotations and re-enactments, we use software and Protected ONTOP to formulate SPARQL commands and question our annotated databases. The controls are a set of relevant precursors conditions for the occurrence of risk. Once executed, outputs must report relevant risks initially identified, to maximize the accuracy of results, and the recall of all concepts related to risk.

Outcome: The results of our SPARQL controls are assessed by classification actions in machine learning and semantic distance measurements between the semantic structure of the identified events vs. semantic structure surrounding the risk covered by our validation. and attempting to classify the events present in the artifacts to be related or not to the risks identified in the audit report public. Performance measures 0.80 above (hypothetical) demonstrate that the tool can identify the precursor events to the occurrence of risks.