A semi-supervised approach for cassava disease classification
Abstract
In this era of Big Data, images play a crucial role in conveying complex ideas, and humans can process them with remarkable speed and accuracy. Concurrently, machine learning has become a powerful tool for acquiring knowledge and automating tasks. This research focuses on applying semisupervised learning methods to address the problem of cassava disease diagnosis, a significant challenge faced by smallholder farmers. Cassava, a vital cash and food crop, su↵ers from diseases such as Cassava Mosaic Disease (CMD) and Cassava Brown Streak Disease (CBSD), leading to substantial yield losses. The current surveillance approach in Uganda involves annual national surveys, but real-time monitoring would revolutionize crop health management. However, manual data labeling for classification systems is labor-intensive, time-consuming, and prone to errors, necessitating the use of semi-supervised learning techniques. The main objective of this study is to develop a classification model based on the semi-supervised learning approach for cassava disease diagnosis in fieldwork surveys. Specific objectives include collecting and preprocessing health and diseased cassava leaf data, building a semi-supervised machine learning model, and evaluating its performance. This research employs supervised and unsupervised learning algorithms, such as Nearest Neighbor, Naive Bayes, Decision Trees, Support Vector Machines (SVM), and Neural Networks, to classify cassava leaf data into healthy or diseased categories. By leveraging a limited amount of labeled data alongside a vast pool of unlabeled data, the proposed semi-supervised model aims to provide a more accurate and ecient automated diagnostic tool for realtime crop health monitoring. The significance of this study lies in the potential to enhance cassava disease management and improve food security in the region. Traditional supervised classifiers require extensive labeled data, which can be challenging and expensive to obtain. By incorporating semi-supervised learning, this research o↵ers an innovative approach that maximizes the utility of both labeled and unlabeled data, leading to more accurate and cost-e↵ective classification models