Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: http://eadnurt.diit.edu.ua/jspui/handle/123456789/14720
Назва: Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task
Автори: Demidovich, Inna
Shynkarenko, Viktor I.
Kuropiatnyk, Olena
Kirichenko, Oleksandr
Ключові слова: natural language texts
authorship attribution
Porter stemmer
genetic algorithm
recurrent analysis
statistical analysis
text classification
dictionary
pattern recognition
КІТ
Дата публікації: 2021
Видавництво: IEEE
Бібліографічний опис: Demidovich I., Shynkarenko V., Kuropiatnyk O., Kirichenko O. Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task. International Scientific and Technical Conference on Computer Sciences and Information Technologies. Vol. 2 : 16th IEEE International Conference on Computer Science and Information Technologies (CSIT 2021), Lviv, 22–25 September 2021. P. 48–51. DOI: 10.1109/CSIT52700.2021.9648829.
Короткий огляд (реферат): ENG: The previously developed method establishes the natural language texts authorship based on frequency analysis, supplemented by indicators of text complexity and recurrent analysis. The authorship indication problem is reduced to the pattern recognition classical theory. To account for the different individual indicators information content, their weights are taken into account. They are determined according to the maximum number of the correctly established texts authorship from the training sample using a genetic algorithm. This method is used to study the effectiveness of the author's style representation that is based on different types of words processing: two types of words stems and 4-grams. To obtain stems, the adapted Porter stemmer is used and creating a dictionary of the foundations of the Ukrainian language original method is applied, respectively. Taking into account the calculated indicators weights, the reliability of establishing the text authorship in the control sample reached 85-91%.
Опис: V. Shynkarenko: ORCID 0000-0001-8738-7225, I. Demidovich: ORCID 0000-0002-3644-184X, O. Kuropiatnyk: ORCID 0000-0003-2286-884x
URI (Уніфікований ідентифікатор ресурсу): https://ieeexplore.ieee.org/document/9648829/references#references
http://eadnurt.diit.edu.ua/jspui/handle/123456789/14720
Інші ідентифікатори: DOI: 10.1109/CSIT52700.2021.9648829
Розташовується у зібраннях:Статті КІТ

Файли цього матеріалу:
Файл Опис РозмірФормат 
Demidovich.pdf109,13 kBAdobe PDFПереглянути/Відкрити


Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.