Application of supervised learning algorithms to identify sociodemographic factors associated with depressive symptoms in Peruvian adults: CHAID model, 2022

Guevara Tirado, Alberto

doi:10.18004/rvspmi/2312-3893/2024.e11122412

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO

Links relacionados

Similares en SciELO

Permalink

Revista Virtual de la Sociedad Paraguaya de Medicina Interna

versión On-line ISSN 2312-3893

Resumen

GUEVARA TIRADO, Alberto. Application of supervised learning algorithms to identify sociodemographic factors associated with depressive symptoms in Peruvian adults: CHAID model, 2022. Rev. virtual Soc. Parag. Med. Int. [online]. 2024, vol.11, n.1, e11122412. Epub 01-Mayo-2024. ISSN 2312-3893. https://doi.org/10.18004/rvspmi/2312-3893/2024.e11122412.

Introduction:

Depressive symptoms are highly prevalent in the Peruvian population. The use of the decision tree algorithm could be beneficial in finding groups especially vulnerable to suffering from depressive symptoms.

Objective:

To determine the groups especially vulnerable to having depressive symptoms according to sociodemographic factors using a machine learning decision tree algorithm.

Material and methods:

An observational, descriptive, retrospective and cross-sectional design was applied. Data came from the National Demographic and Health Survey. The population was 32,062 adults and the dependent variable was the presence of depressive symptoms, and as explanatory variables: age group, mother tongue, ethnic group, educational level, age of onset of alcohol consumption, alcohol consumption, marital status, sex. The decision tree algorithm using automatic chi-square interaction detection (CHAID) was used.

Results:

The significant variables in the algorithm were sex, type of mother tongue, marital status, age group, educational level achieved, correctly classifying 75.80% of the cases of depressive symptoms. The nodes mainly associated with the presence of depressive symptoms were: node 2 (female sex), node 6 (adults from 39 years old), and node 13 (education up to secondary school). According to sex, in women, the variables mainly associated were those corresponding to node 2 (adults from 39 years of age), node 5 (education up to secondary school) and node 13 (original mother tongue). In men, the nodes mainly associated with depressive symptoms were node 2 (native mother tongue), node 6 (adults from 39 years of age) and node 11 (educational level reached up to secondary school).

Conclusions:

The main sociodemographic group associated with the development of depressive symptoms is the female sex, from the age of 39 and whose education has reached the school stage. The use of machine learning algorithms is useful to create screening tools for populations vulnerable to suffering from depressive symptoms.

Palabras clave : machine learning; depression; epidemiology; mental health; Peru.

· resumen en Español · texto en Español · Español (

pdf )