International Scientific and Practical Conference

"Electronics and Information Technologies"

Main page Search Rules for Authors English     Ðóññêèé

Issue 10

Issue 10, Pages: A-73-A-80
DOI: https://doi.org/10.30970/elit2018.A22
Influence of Unique Words on the Performance of Corpus-Based Keyword Detection Methods
O. Kushnir, V. Yaremkiv, I. Dovhan, A. Kashuba
We study the performance of corpus-based key¬word detection methods, including TF-IDF, in a particular case when a text under investigation contains unique words, which are absent or rare in the other texts of corpus. The two points are subjects of our main attention, the quality of keyword list and propriety of the corresponding keyness scores, as well as criticality of the methods to small perturbations of the corpus. We conclude that a number of heuristically introduced TF-IDF-like measures compete quite successfully with TF-IDF in their performance but, on the other hand, they cannot cope with the problem of criticality of their scores inherent to the unique words
PDF Version

Main page Search Rules for Authors English     Ðóññêèé

© Ivan Franko National University of Lviv, 2018

Developed and supported - Laboratory of high performance computing systems