Relation Extraction using Different Features in Portuguese

Authors

Keywords:

Extração de Relações Abertas, Seleção de Características

Abstract

Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).

Author Biography

  • Daniela Barreiro Claro, Federal University of Bahia
    Daniela é professora Adjunta da Universidade Federal da Bahia. Ela obteve o seu Mestrado em Ciências da Computação pela Universidade Federal de Santa Catarina (2000) e o seu Doutorado em Ciência da Computação - Université d'Angers/França (2006). Em 2009, ela fundou o Grupo de Pesquisa FORMAS - Formalismos e Aplicações Semânticas no CNPQ e desde então é líder deste grupo, promovendo pesquisas na área de Similaridade Semântica e Extração da Informação. Suas principais áreas de interesse são: Similaridade Semântica, Serviços Web Semânticos, Extração da Informação, Mineração de Dados, Recuperação da Informação

References

Published

2014-12-26

Issue

Section

Research Articles

How to Cite

Relation Extraction using Different Features in Portuguese. (2014). Linguamática, 6(2), 57-65. https://www.linguamatica.com/index.php/linguamatica/article/view/v6n2-4