Approach to the automatic creation of an annotated dataset for the detection, localization and classification of blood cells in an image

Kovalenko, S. M.; Kutsenko, O. S.; Kovalenko, S. V.; Kovalenko, A. S.

Approach to the automatic creation of an annotated dataset for the detection, localization and classification of blood cells in an image

Files

S_128 Kovalenko.pdf (1.23 MB)

Date

2024

Authors

Publisher

Національний університет «Запорізька політехніка»

Abstract

EN: Context. The paper considers the problem of automating the creation of an annotated dataset for further use in a system for detecting, localizing and classifying blood cells in an image using deep learning. The subject of the research is the processes of digital image processing for object detection and localization. Objective. The aim of this study is to create a pipeline of digital image processing methods that can automatically generate an annotated set of blood smear images. This set will then be used to train and validate deep learning models, significantly reducing the time required by machine learning specialists. Method. The proposed approach for object detection and localization is based on digital image processing methods such as filtering, thresholding, binarization, contour detection, and filling. The pipeline for detection and localization includes the following steps: The given fragment of text describes a process that involves noise reduction, conversion to the HSV color model, defining a mask for white blood cells and platelets, detecting the contours of white blood cells and platelets, determining the coordinates of the upper left and lower right corners of white blood cells and platelets, calculating the area of the region inside the bounding box, saving the obtained data, and determining the most common color in the image; filling the contours of leukocytes and platelets with said color; defining a mask for red blood cells; defining the contours of red blood cells; determining the coordinates of the upper left and lower right corners of red blood cells; calculating the area of the region within the bounding box; entering data about the found objects into the dataframe; saving to a .csv file for future use. With an unlabeled image dataset and a generated .csv file using image processing libraries, any researcher should be able to recreate a labeled dataset. Results. The developed approach was implemented in software for creating an annotated dataset of blood smear images Conclusions. The study proposes and justifies an approach to automatically create a set of annotated data. The pipeline is tested on a set of unlabelled data and a set of labelled data is obtained, consisting of cell images and a .csv file with the attributes “file name”, “type”, “xmin”, “ymin”, “xmax”, “ymax”, “area”, which are the coordinates of the bounding box for each object. The number of correctly, incorrectly, and unrecognised objects is calculated manually, and metrics are calculated to assess the accuracy and quality of object detection and localisation. UK: Актуальність. Розглянуто проблему автоматизації створення анотованого набору даних для його подальшого використання в системі виявлення, локалізації та класифікації клітин крові на зображенні з використанням глибокого навчання. Об’єктом дослідження є процеси обробки цифрових зображень для виявлення та локалізації об’єктів. Мета роботи – розробка пайплайну із послідовності методів обробки цифрових зображень для автоматичного створення анотованого набору зображень мазків крові з подальшим використанням для навчання та валідації моделей глибокого навчання, що має суттєво скоротити час спеціалістів з машинного навчання. Метод. Запропонований підхід для виявлення та локалізації об’єктів базується на методах обробки цифрових зображень: методах фільтрації, порогової фільтрації, бінаризації, знаходження та заливки контурів тощо. Пайлайн по виявленню та локалізації складається з наступних кроків: приглушення шумів; перетворення в HVS кольорову модель; визначення маски для лейкоцитів та тромбоцитів; визначення контурів лейкоцитів та тромбоцитів; визначення координат верхнього лівого та правого нижнього кутів лейкоцитів та тромбоцитів; обчислення площі області всередині обмежувальної рамки; збереження отриманих даних; визначення найпоширенішого кольору на зображенні; заливка цим кольором контурів лейкоцитів та тромбоцитів; визначення маски для еритроцитів; визначення контурів еритроцитів; визначення координат верхнього лівого та правого нижнього кутів еритроцитів; обчислення площі області всередині обмежувальної рамки; занесення до датафрейму даних про знайдені об’єкти; збереження в файлі .csv для подальшого використання. Результати. Розроблений підхід був впроваджений у програмне забезпечення для створення анотованого набору даних зображень мазків крові. Висновки. В дослідженні запропоновано та обґрунтовано підхід для автоматичного створення набору анотованих даних. Пайплайн протестовано на наборі нерозмічених даних та отримано набір розмічених даних, що складається з зображень клітин та файлу в форматі .csv, що має ознаки «назва файлу», «тип клітини», «xmin», «ymin», «xmax», «ymax», що є координатами обмежувальної рамки для кожного об’єкту. Підраховано кількість правильно, неправильно та нерозпізнаних об’єктів та розраховано метрики для оцінки точності та якості виявлення та локалізації об’єктів.

Description

Kovalenko S. M. Approach to the automatic creation of an annotated dataset for the detection, localization and classification of blood cells in an image / S. M. Kovalenko, O. S. Kutsenko, S. V. Kovalenko, A. S. Kovalenko // Радіоелектроніка, інформатика, управління. – 2024. – № 1 (68). – C. 128-139.