nerc.ac.uk

Taxonomic corpus-based concept summary generation for document annotation

Nkisi-Orji, Ikechukwu ORCID: https://orcid.org/0000-0001-9734-9978; Wiratunga, Nirmalie ORCID: https://orcid.org/0000-0003-4040-2496; Hui, Kit-ying ORCID: https://orcid.org/0000-0001-8383-7954; Heaven, Rachel ORCID: https://orcid.org/0000-0002-6172-4809; Massie, Stewart ORCID: https://orcid.org/0000-0002-5278-4009. 2017 Taxonomic corpus-based concept summary generation for document annotation. In: Kamps, Jaap; Tsakonas, Giannis; Manolopoulos, Yannis, (eds.) Research and advanced technology for digital libraries. Springer, 49-60. (Lecture Notes in Computer Science, 10450).

Before downloading, please read NORA policies.
[img]
Preview
Text
Corpus_based_concept_summaries_for_document_annotation.pdf - Accepted Version

Download (395kB) | Preview

Abstract/Summary

Semantic annotation is an enabling technology which links documents to concepts that unambiguously describe their content. Annotation improves access to document contents for both humans and software agents. However, the annotation process is a challenging task as annotators often have to select from thousands of potentially relevant concepts from controlled vocabularies. The best approaches to assist in this task rely on reusing the annotations of an annotated corpus. In the absence of a pre-annotated corpus, alternative approaches suffer due to insufficient descriptive texts for concepts in most vocabularies. In this paper, we propose an unsupervised method for recommending document annotations based on generating node descriptors from an external corpus. We exploit knowledge of the taxonomic structure of a thesaurus to ensure that effective descriptors (concept summaries) are generated for concepts. Our evaluation on recommending annotations show that the content that we generate effectively represents the concepts. Also, our approach outperforms those which rely on information from a thesaurus alone and is comparable with supervised approaches.

Item Type: Publication - Book Section
Digital Object Identifier (DOI): https://doi.org/10.1007/978-3-319-67008-9_5
ISBN: 9783319670089
Additional Keywords: Taxonomy, Text annotation, Information discovery
NORA Subject Terms: Computer Science
Date made live: 31 Mar 2023 08:52 +0 (UTC)
URI: https://nora.nerc.ac.uk/id/eprint/534281

Actions (login required)

View Item View Item

Document Downloads

Downloads for past 30 days

Downloads per month over past year

More statistics for this item...