Accuracy evaluation of Naive Bayes and Logistic Regression for classification with binary attributes and classes

López-Pezoa, Edgar; Cáceres-Estigarribia, Antoliano; Grillo, Sebastián Alberto; Herrera, Edher

doi:10.18004/rcfacen.2022.13.1.73

Services on Demand

Journal

Article

Indicators

Cited by SciELO

Reportes científicos de la FACEN

Print version ISSN 2222-145X

Abstract

LOPEZ-PEZOA, Edgar; CACERES-ESTIGARRIBIA, Antoliano; GRILLO, Sebastián Alberto and HERRERA, Edher. Accuracy evaluation of Naive Bayes and Logistic Regression for classification with binary attributes and classes. Rep. cient. FACEN [online]. 2022, vol.13, n.1, pp.73-84. ISSN 2222-145X. https://doi.org/10.18004/rcfacen.2022.13.1.73.

In data science, most classification models fall into the category of either discriminative models or generative models. Discriminative models only capture the relationship between the attributes of an instance and its class, whereas generative models seek to represent the entire data distribution. Although most classification models are discriminative, it cannot be assured that this type of models is better than generative models. In that sense, the comparison of Naive Bayes and Logistic Regression algorithms as very representative models of dis- criminative and generative classifiers, respectively, is addressed. In this work, the accuracy of Naive Bayes and Logistic Regression models are evaluated as a function of the number of attributes and instances of an artificial dataset, where both attributes and classes are binary. Unlike other methodologies that employ the datasets to approximate the classification error, this work only employs the datasets to perform the training of the models, while the classification error is computed exactly for the distribution of the data. Experiments show a binary classification accuracy that tends to be slightly better for Logistic Regression using 50 to 500 training instances, when we average the results of randomly generated distributions with 1 to 6 binary attributes.

Keywords : Naive Bayes; logistic regression; classification; supervised learning.

· abstract in Spanish · text in Spanish · Spanish (

pdf )