This week, I continued with the evaluation task that I was working on last week. Since the ontology matching algorithm was extremely slow, I looked into the AgreementMaker Lite (AML) tool to identify the main cause. I found out that the algorithm loads in the source and target ontologies in several locations and since our target ontology is big (consists of more than 8000 classes), this might have delayed the processing speed even further. In addition, since we are matching different source ontologies with a single target ontology (that was obtained after merging together 10 ontologies), we do not need to keep on loading the target ontology each time we load a new source ontology. After making these changes to AML, the speed improved from 35 ontologies in 24 hours to 35 ontologies in 3 hours. However since there are 830 source ontologies, we still need to improve the speed of the algorithm.
I simultaneously wrote down the code to take in the package_id, matched classes and the similarity score from the manual annotation and the csv file written down by the ontology matching algorithm. It also calculates the values for precision, recall, and fscore. Initially, I ran the algorithm with only two matchers (the basic string matcher and cosine similarity matcher) and found out the precision, recall values. It was seen that there was not a single overlap between the manual and automated results. Since the reason behind this might be the poor performance of the two basic matchers that were used, I ran the matching algorithm with all of the existing matchers. After evaluating the performace for 15 datasets, I found that the precision and recall values were still zero.
Upon further analysis, I observed that there are discrepancies in the “class name” between the source and target ontologies; the source ontology has manually provided identifiers as “class name” while the target ontology has single or multiword terms. Since the matching algorithm explores the similarity between the “class name” of the corresponding source and target ontologies, this difference in the values of the “class name” is the reason behind the zero values of precision and recall. Currently, I am working on matching the “labels” from the source ontologies with the “class name” of the target ontology. I expect this to improve the precision and recall scores. Next week, I will work on further improving the scores of evaluation.