International Scientific and Practical Conference
"Electronics and Information Technologies"
Issue 10, Pages: A-73-A-80 |
DOI: https://doi.org/10.30970/elit2018.A22 |
Influence of Unique Words on the Performance of Corpus-Based Keyword Detection Methods |
O. Kushnir, V. Yaremkiv, I. Dovhan, A. Kashuba |
We study the performance of corpus-based key¬word detection methods, including TF-IDF, in a particular case when a text under investigation contains unique words, which are absent or rare in the other texts of corpus. The two points are subjects of our main attention, the quality of keyword list and propriety of the corresponding keyness scores, as well as criticality of the methods to small perturbations of the corpus. We conclude that a number of heuristically introduced TF-IDF-like measures compete quite successfully with TF-IDF in their performance but, on the other hand, they cannot cope with the problem of criticality of their scores inherent to the unique words |
|
© Ivan Franko National University of Lviv, 2018
Developed and supported - Laboratory of high performance computing systems
|