Effective algorithm for parsing sentences using semantically attributed weighted affix context free

Davydov, M. V.; Lozynska, O. V.; Pasichnyk, V. V.; Давидов, М. В.; Лозинська, О. В.; Пасічник, В. В.

Effective algorithm for parsing sentences using semantically attributed weighted affix context free

Files

S_124 Davydov.pdf (489.59 KB)

Date

2017

Authors

Publisher

Національний університет "Запорізька політехніка"

Abstract

EN: Context. The problem of increasing efficiency of affix grammars over a finite lattice (AGFL) is considered. AGFL is a context-free grammar with flexible and compact form of productions for parsing texts in natural languages. Objective. The goal of the work is to increase efficiency of parsing sentences by means of AGFL with a modification that adds semantical attributes to the productions and introduces a new form of production called the “template production”. This modification helps to decrease the number of productions that are required to describe a language and lets reduce the computational complexity of the parsing algorithm. Method. A mathematical model of the template production is developed and the theorem is proved that claims that the normal form of the template production exists and the normalization procedure produces an equivalent grammar. The normal form is utilized to increase efficiency of parsing Ukrainian sentences. The template production helps to represent ontology-based rules in a short and computationally inexpensive way. The normal form of template production is studied, and an effective algorithm for parsing sentences is proposed. The worst-case complexity of the proposed algorithm is , where is the length of input string of terminals, is the maximum number of combinations of symbol and attributes that can produce the same string of terminals, and is the maximum number of productions that have the same starting non-terminal symbol in the right part. The growth of parsing time turned out to be almost linear function of the number of words in a sentence when parsing of sentences from the test database of Ukrainian fiction literature. Results. The developed method has been implemented in the UkrParser software that is available open-source on GitHub. Conclusions. The developed algorithm was tested on the database of Ukrainian sentences and demonstrated ten times faster parsing speed than Stanford parser. The future research can be focused on the development of grammatically attributed ontologies for wider set of topics that should improve results of semantical sentence parsing. UK: Актуальність. Розглядається задача підвищення ефективності афіксних граматик над скінченною граткою (AGFL). AGFL – це контекстно-вільна граматика з гнучкими і компактними формами для розбору текстів на природних мовах. Мета роботи. Метою роботи є підвищення ефективності розбору речень за допомогою модифікації AGFL, яка додає семантичні атрибути в продукції граматики і вводить нову форму продукцій під назвою «шаблонна продукція». Ця модифікація допомагає зменшити кількість продукцій, необхідних для опису мови, і дозволяє зменшити обчислювальну складність алгоритму синтаксичного аналізу. Метод. Розроблено математичну модель шаблонної продукції і доведено теорему про те, що існує нормальна форма шаблонних продукцій, а процедура нормалізації породжує еквівалентну граматику. Нормальна форма використовується для підвищення ефективності розбору українських речень. Шаблонні продукції допомагають описувати правила на основі онтології в короткій і обчислювально ефективній формі. Вивчається нормальна форма шаблонних продукцій і пропонується ефективний алгоритм для розбору речень. У найгіршому випадку обчислювальна складність запропонованого алгоритму становить , де – довжина вхідного рядка терміналів, – максимальне число комбінацій символів і атрибутів, які можуть породжувати один і той самий рядок терміналів, – максимальне число продукцій, які мають той самий стартовий нетермінальний символ в правій частині. Час синтаксичного аналізу виявився майже лінійною функцією від кількості слів у реченні при розборі тестової бази речень української художньої літератури. Результати. Розроблений метод був реалізований в програмному забезпеченні UkrParser, яке доступне з відкритим вихідним кодом на GitHub. Висновки. Розроблений алгоритм був протестований на базі даних українських речень і продемонстрував в десять разів більшу швидкість розбору, ніж аналізатор «Stanford Parser». Майбутні дослідження можуть бути сфокусовані на розробці граматично доповнених онтологій для більш широкого набору предметних областей, що має поліпшити результати семантичного аналізу речень.

Description

Davydov M. V. Effective algorithm for parsing sentences using semantically attributed weighted affix context free / M. V. Davydov, O. V. Lozynska, V. V. Pasichnyk // Радіоелектроніка, інформатика, управління. – 2017. – № 4 (43). – C. 124-130.