Automation of Template Formation to Identify the Structure of Natural Language Documents

Kuropiatnyk, Olena S.; Shynkarenko, Viktor I.

Automation of Template Formation to Identify the Structure of Natural Language Documents

Files

Kuropiatnyk 17.pdf (1.3 MB)

Date

2021

Authors

Kuropiatnyk, Olena S.

Shynkarenko, Viktor I.

Publisher

CEUR-WS Team, Aachen, Germany

Abstract

ENG: In the task of text borrowings and plagiarism detection, it is important to take into account the structure of the document. This allows getting a more accurate assessment of the text and reducing the volume of material for comparison. Using a template allows identifying the structure of the document. The paper presents a constructive synthesizing model for automating the construction of a structural template of a document. Possible implementations of some algorithms by means of programming in C# are considered. Their comparative assessment is performed. Possible modification of the template is presented to increase the importance of keywords and simplify the xml-tree, which is a template.
UKR: У задачі із запозичення тексту і виявлення плагіату важливо враховувати структуру документа. Це дозволяє отримати більш точну оцінку тексту і зменшити обсяг матеріалу для порівняння. Використання шаблону дозволяє визначити структуру документа. У статті представлена конструктивна синтезуюча модель для автоматизації побудови структурного шаблону документа. Розглянуто можливі реалізації деяких алгоритмів засобами програмування на C #. Проведена їх порівняльна оцінка. Можлива модифікація шаблону представлена для збільшення важливості ключових слів і спрощення xml-дерева, яке є шаблоном.
RUS: В задаче по заимствованию текста и обнаружению плагиата важно учитывать структуру документа. Это позволяет получить более точную оценку текста и уменьшить объем материала для сравнения. Использование шаблона позволяет определить структуру документа. В статье представлена конструктивная синтезирующая модель для автоматизации построения структурного шаблона документа. Рассмотрены возможные реализации некоторых алгоритмов средствами программирования на C #. Проведена их сравнительная оценка. Возможная модификация шаблона представлена для увеличения важности ключевых слов и упрощения xml-дерева, которое является шаблоном.

Description

O. Kuropiatnyk: ORCID 0000-0003-2286-884x; V. Shynkarenko: 0000-0001-8738-7225

Keywords

natural language, document comparison, plagiarism detection, document structure, document template, constructive-synthesizing modeling, constructor, структура документу, шаблон документу, конструктивно-синтезуюче моделювання, конструктор, естественный язык, сравнение документов, обнаружение плагиата, конструктивно-синтезирующее моделирование, КІТ

Citation

Kuropiatnyk, O., Shynkarenko V. Automation of template formation to identify the structure of natural language documents. CEUR Workshop Proceedings. Vol. 2870 : 5th International Conference on Computational Linguistics and Intelligent Systems. Vol. I: Main Conference (COLINS 2021), Lviv, Ukraine, 22–23 April 2021. Lviv, 2021. P. 179–190.

URI

http://eadnurt.diit.edu.ua/jspui/handle/123456789/13850
http://ceur-ws.org/Vol-2870/
http://ceur-ws.org/Vol-2870/paper17.pdf

Collections

Статті КІТ

Full item page