Information Extraction for Ontology Population Tasks. An Application to the Italian Archaeological Domain

Maria Pia di Buono


In the last year many approaches to Information Extraction (IE) task has been developed. Some of these are concept-based systems which use a reduced number of characteristics in order to represent semantic content. On the other hand, some systems are based on term-representation. More recent techniques involve ontology-based approaches. In fact, “ontologies reflect the structure of the domain and constrain the potential interpretations of terms” [1]. In this paper we present an on-going research, based on Lexicon-Grammar (LG) framework, which aims at improving Term Extraction (TE) in the Archaeological domain. We intend to demonstrate how our language formalization technique can be applied for processing unstructured texts in order to both entity recognition and domain ontology population tasks. Starting from the assumption that a coherent and consistent language formal description is crucial and indispensable to achieve a correct semantic representation of whatsoever knowledge domain, this study focuses on a different approach to content analysis and IE.

Full Text:



SANCHEZ-CISNEROS, Daniel; GALI, Fernando Aparicio. UEM-UC3M: An Ontology-based named entity recognition system for biomedical texts. Proceedings of SemEval, 2013, 622-627.

EKBAL, Asif, et al. Rapid Adaptation of NE Resolvers for Humanities Domains using Active Annotation. JLCL, 2011, 26.2: 39-51.

BACHIMONT, Bruno. Engagement sémantique et engagement ontologique: conception et réalisation d’ontologies en ingénierie des connaissances. Ingénierie des connaissances: évolutions récentes et nouveaux défis, 2000.

WIMALASURIYA, Daya C.; DOU, Dejing. Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science, 2010.

KIM, Sanghee, et al. Artequakt: Generating tailored biographies from automatically annotated fragments from the web. 2002.

DROZDZYNSKI, Witold, et al. Shallow Processing with Unification and Typed Feature Structures - Foundations and Applications. KI, 2004, 18.1: 17-.

ETZIONI, Oren, et al. Web-scale information extraction in knowitall:(preliminary results). In: Proceedings of the 13th international conference on World Wide Web. ACM, 2004. p. 100-110.

NAVIGLI, Roberto; VELARDI, Paola. Enriching a formal ontology with a thesaurus: an application in the cultural heritage domain. In: Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge–OLP. 2006. p. 1-9.

SUCHANEK, Fabian M.; IFRIM, Georgiana; WEIKUM, Gerhard. LEILA: Learning to extract information by linguistic analysis. In: Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge. 2006. p. 18-25.

SOWA John F. Knowledge Representation: Logical, Philosophical and Computational Foundations. Pacific Grove, CA: Brooks Cole Publishing Co. 2000.

DI BUONO, Maria Pia; MONTELEONE, Mario; ELIA, Annibale. Terminology and Knowledge Representation Italian Linguistic Resources for the Archaeological Domain. In: Proceedings of 25th International Conference on Computational Linguistics (COLING 2014) - Workshop on Lexical and Grammatical Resources for Language Processing (LG-LP 2014), 2014.

SILBERZTEIN, Max. Dictionnaires électroniques et analyse automatique de textes. Paris: Masson, 1993.

VIETRI, Simona; MONTELEONE, Mario Vietri, S.; The English NooJ dictionary. In Koeva S., Mefar S., Silberztein M. Formalising Natural Language with NooJ International NooJ 2013. Cambridge Scholars Publishing. 2014.

GROSS, Maurice. Grammaire transformationnelle du français. Cantilène, 1968.

GROSS, Maurice. Méthodes en syntaxe. Hermann, 1975.

ELIA, Annibale; MARTINELLI, Maurizio; D'AGOSTINO, Emilio. Lessico e strutture sintattiche: introduzione alla sintassi del verbo italiano. Liguori, 1981.

HARRIS, Zellig Sabbetaï. Notes du cours de syntaxe, traduction française par Maurice Gross. Paris: Le Seuil, 1976.

MARANO, Federica. Exploring Formal Models of Linguistic Data Structuring. Enhanced Solutions for Knowledge Management Systems Based on NLP Applications. PhD Dissertation, University of Salerno, Italy, 2012.

SILBERZTEIN, Max. NooJ manual. Available for download at: www. nooj 4nlp. net, 2003.

DI BUONO, Maria Pia; MONTELEONE, Mario; Knowledge Management and Extraction for Cultural Heritage Repositories. In: Monti J., Silberztein M., Monteleone M., di Buono M.P. (eds.) Formalising Natural Language with NooJ International NooJ 2014. Cambridge Scholars Publishing (in press).

VIETRI Simona. The Italian Module for NooJ. In: Proceedings of the First Italian Conference on Computational Linguistics, CLiC-it 2014.University Press, 2014.

HARRIS, Zellig Sabbettai. A grammar of English on mathematical principles. New York:: Wiley, 1982.

TESNIERE, Lucien. Éléments de Syntaxe Structurale. Librairie C. Klincksieck, Paris, 1959.

ELIA, Annibale; VIETRI, Simona; POSTIGLIONE, Alberto; MONTELEONE, Mario; MARANO, Federica. Data Mining Modular Software System. In: Arabnia H.R., Marsh A., Solo A.M.G (eds.) Proceedings of The 2010 International Conference on Semantic Web & Web Services, WorldComp 2010 Conference. USA: CSREA Press, 2010.

HANKS, Patrick. Lexical Analysis. Norms and Exploitations. MIT Press. Cambridge. 2013.

DOERR, Martin. The CIDOC conceptual reference module: an ontological approach to semantic interoperability of metadata. AI magazine, 2003, 24.3: 75.

CHIARCOS Christian, MCCRAE John, CIMIANO Philipp, and FELLBAUM Christiane. Towards open data for linguistics: Linguistic linked data. In: New Trends of Research in Ontologies and Lexical Resources. Springer, 2012.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.