Introduction
Mental disorders are one of the leading causes of the global health burden1. In 2017, it was estimated that approximately 264 million and 322 million people worldwide were living with anxiety disorders and depression, respectively2. In 2019, depression and anxiety were considered the two most disabling mental disorders and were among the top 25 causes of health-related burdens worldwide3. With the onset of the COVID-19 pandemic, it was estimated that there was a 25% increase in the number of cases of anxiety disorder and depression globally, with more than one billion people suffering from a mental disorder prior to the pandemic4. This situation was expected because the prevalence of mental health problems tends to increase during disease outbreaks5.
In Peru, during the pandemic, 19.7% and 14.5% of individuals were reported to have clinically relevant symptoms of depression and anxiety, respectively6. For example, an average increase of 0.17% per quarter in the prevalence of moderate depressive symptoms was observed after the onset of the pandemic, that is, an increase of 1583 new cases of moderate depressive symptoms approximately every three months7. However, previous studies or others that evaluated different mental health symptoms in different population groups in Peru(8, 9) did not consider the participation of Quechua-speaking groups.
Peru is a multicultural and multilingual country where 3 million 375 thousand 682 people (13.9%) have Quechua as their mother tongue, extending to almost all departments of the country10. Little information is available on the prevalence of mental health symptoms in the Quechua-speaking population11. Pre-pandemic studies have reported that 38.9% of Quechua speakers in Ayacucho presented symptoms of depression12. Quechua people tend to perceive and express mental health symptoms differently than Spanish speakers do. It has been reported that the high Andean Quechua people construct and experience expressions of distress and suffering, such as pinsamientuwan (troubling thoughts), ñakary (suffering), and llaki (sadness), in a context characterized by the presence of social inequalities, social exclusion, and political violence13. Many of these mental health problems can lead to suicide in quechua-speaking populations14. Despite the above, no studies have reported the prevalence of mental health symptoms in the Peruvian Quechua-speaking population during the pandemic. Although we are in the post-pandemic period, it is important to monitor symptoms related to anxiety and depression that may be frequently reported after the pandemic15. The Quechua-speaking population has distinct cultural and linguistic characteristics that hinder access to quality mental health care because of the absence of trained personnel with adequate language proficiency(11, 16). This gap in mental health care means that when a Quechua-speaking population has an episode of sadness or llaqui, they go to Yachactaita, who is the person who heals using plants and rituals, because of the trust they have in him and the proximity to their communities, and do not go to a mental health professional17.
The presence of mental health problems in the Quechua-speaking population has generated a need for instruments with adequate psychometric evidence to measure depression and anxiety. However, to the best of our knowledge, no instrument has been adapted to the Quechua language that would contribute to the objective diagnosis of depression and anxiety at the individual or group level. In Peru, most health personnel are not fluent in Quechua, either as a first or a second language. Therefore, to assess Quechua-speaking populations, bilingual people, usually younger family members who have a different worldview than the person being assessed, were used to translate the questions and answers of the questionnaire from Spanish to Quechua. However, this is a poorly recommended practice as it introduces bias in the assessment.
There are different instruments that assess anxiety and depression adapted and validated in Peru, such as the Reynolds Adolescent Depression Scale (RADS-2)18, the Patient Health Questionnaire-9 (PHQ-9)19, the two-item version of the PHQ (PHQ-2)20, Coronavirus Anxiety Scale21,22, Generalized Anxiety Disorder - 7 (GAD-7)23, and the Depression, Anxiety and Stress Scales (DASS-21)24. However, these instruments have been adapted in urban areas and in the Spanish language, leaving aside the population groups of the high Andean areas. This shows the lack of diagnostic tools in the Quechua language that have an adequate cultural adaptation process. It is important to have screening instruments that allow the early detection of depressive and anxiety symptoms in individuals or populations at risk20.
The Patient Health Questionnaire for Depression and Anxiety (PHQ-4)25 has proven useful in research and clinical practice26. The PHQ-4 comprises the first two items of the PHQ-9 and the first two items of the GAD-7. These items are based on the DSM-IV criteria for the diagnosis of depression and generalized anxiety disorder27. The PHQ-4 was developed based on the assumption that symptoms of depression and anxiety frequently coexist and the intent to be able to identify individuals who may be suffering from one or both sets of symptoms28. A composite measure of anxiety and depression based on a single construct is not in accordance with the current nosological science, as suggested in the Diagnostic and Statistical Manual of Mental Disorders29. However, there is evidence of comorbidity between anxiety and depression across the lifespan(30, 31).
Previous studies have shown adequate evidence of the construct validity of the PHQ-4, significant relationships with the degree of functional disability, and the ability to discriminate between groups with worse anxiety and depressive symptoms25,26,32. For example, in the original study conducted on a sample of 2149 patients from 15 primary care clinics in 12 U.S. states, Kroenke et al.25) used the principal component analysis method to suggest the presence of two subscales, one for depression and one for anxiety, consisting of two items each. The same study indicated that the reliability estimates for anxiety, depression, and total scale subscales were good (0.85, 0.82, and 0.81, respectively). Another more recent study, involving 5030 participants from the general population of Germany, showed evidence of the presence of a two-factor model of the PHQ-4, with good reliability and invariance across different age and gender groups, and showed significant relationships with demographic risk factors for depression and anxiety32. A study conducted in South Korea with outpatients at the Department of Psychiatry, Ansan Hospital, Korea University33 indicated that the Korean version of the PHQ-4 has good internal consistency reliability, test-retest reliability, and significant relationships with other depression/anxiety scales. Regarding its internal structure, a Korean study indicated that although the two-factor model had better fit indices than the one-factor model, the two-factor structure may not be completely adequate. Another study with Midwestern college students34 indicated the presence of two factors (anxiety and depression) using component analysis and varimax rotation, with good reliability. In addition, it was reported that people diagnosed with depression or anxiety by a mental health professional had significantly higher scores on the PHQ-4, providing evidence of criterion validity. One study indicated that the PHQ-4 has a two-dimensional structure, adequate reliability, and can adequately assess the dimensions of anxiety and depression during pregnancy35. The PHQ-4 also presented a good fit for a two-factor model, good reliability, and significant relationships with perceived stress in a sample of English- and Spanish-speaking Spanish Americans36. In South America, the psychometric properties of the PHQ-4 have been evaluated in the general population of Colombia37, with a Cronbach's alpha coefficient value of 0.84. The presence of a two-factor model, which was invariant across different age and sex groups, evidence of convergent validity with another measure of hospital anxiety, depression, and general health, and evidence of divergent validity with measures of self-efficacy and life satisfaction were reported.
Compared to the other instruments mentioned above that measure symptoms of depression and anxiety, the PHQ-4 has several advantages. First, long instruments pose challenges for different groups of people, such as those with deficits in cognitive functioning38. In this regard, ultra-short measures, such as the PHQ-4, comprising simply worded items, can be completed within a few minutes39. Ultra-brief screening instruments are defined as measures that have one to four items and take less than four minutes to complete39. Ultrashort measures are useful in primary care settings with limited time and resources40. Second, two- or three-item measures have been shown to perform better than single-item measures39. The PHQ-4 comprises items that measure the core characteristics of depression and anxiety, which are common psychiatric syndromes in many Latin American countries. Third, rather than being a definitive diagnostic measure, the PHQ-4 is a preliminary measure that suggests further research on anxiety and depression, which promotes discussion and further treatment planning and evaluation25.
The PHQ-4 has been used to detect symptoms of depression and anxiety in different groups, such as oncology patients41, heart disease patients42, and recently in healthcare personnel during the COVID-19 pandemic43. However, unlike other similar measures such as the PHQ-9 and GAD-7, psychometric evidence for the PHQ-4 has not been widely examined in Latin American cultural contexts, and less so in Quechua speakers. This is despite the fact that, as reported, different studies have provided evidence for the widespread use of the PHQ-4 in medical and psychological research28.
Therefore, the aim of this study was to adapt and validate the PHQ-4(25, 32) in the Quechua-speaking population of the Collao variant of Puno-Peru. Specifically, evidence of validity based on internal structure was evaluated by confirmatory factor analysis (CFA), reliability with the internal consistency method, and the difficulty and discrimination characteristics of the items based on Item Response Theory (IRT).
Generally, CFA and internal consistency reliability tests have been used to validate mental health measurement instruments for use in research and clinical practice44, whereas IRT-based procedures have rarely been used45. However, the development and validation of measurement instruments can benefit from the complementary use of the two methods. CFA focuses on the study of the structure of an instrument, such as the number of factors, factor loadings of items, presence of correlated errors, and presence of higher-order factors, among others. This provides evidence to indicate whether the set of items can be used to measure a construct for research or practice46. On the other hand, IRT brings together a set of model-based techniques that examine the process by which people respond to the items of a measurement instrument47. IRT methods focus on the items and, therefore, assess their individual characteristics and the specific relationships of the item with the latent trait. In general, CFA allows for a summative view of the level of anxiety and depression in individuals, allowing the calculation of a total score, whereas IRT analyses identify items that are more likely to be selected by people with different levels of anxiety and depression48. The combination of SEM and IRT allows a more complete overview of an instrument48. This would help develop more accurate profiles of individuals, allow the identification of anxiety and depression levels in different population groups, and help to design more effective interventions49. Having an instrument with adequate psychometric evidence is important because the detection of anxiety and depression symptoms in the Quechua-speaking population is a necessary first step in improving the outcomes of patients diagnosed with anxiety and depression disorders32. This is even more important if one considers that indigenous peoples are vulnerable groups because of the difficulties in developing prevention and mental health promotion strategies if their idiosyncratic characteristics and language are not taken into account50,51. This will have greater adverse impacts on mental health problems in this population52.
Methods
Participants
Participants were speakers of the Collao variant Quechua living in the northeast of the high Andean city of Puno, mainly in the provinces of San Román, Azángaro, and Melgar. Participants were selected by non-probabilistic convenience sampling according to the following inclusion criteria:1) being of legal age; 2) being able to communicate in Quechua (read, write, and understand); and 3) providing informed consent. Exclusion criteria were those who understood only Quechua, but could not speak or write. The number of participants was calculated using Soper software (2023) based on the number of observed (n = 4) and latent (n = 1) variables in the model, the anticipated effect size (λ = 0.3), the desired probability (α = 0.05), and the statistical power level (1-β = 0.95). The minimum recommended sample size was 100 individuals. The final number of participants in this study was higher than the recommended minimum.
A total of 221 speakers of the Collao variant of Quechua language with ages ranging from 18 to 66 years, with an average age of 31.2 years (SD = 11.7) participated. Regarding sex, 104 were men (47.1%) and 117 were women (52.9%).
Measures
Sociodemographic survey. This survey was constructed specifically for the present study with the objective of obtaining information on sex and age.
The Patient Health Questionnaire-4 (PHQ-4)25. The PHQ-4 detects the frequency of the main symptoms of a depression and anxiety disorder during the last 2 weeks. It consists of four items, combining the two items of the PHQ-253 and two items of Generalized Anxiety Disorder Scale- 2 ítems (GAD-2)54. The first two items measure the frequency of depressive symptoms ("Little interest or pleasure in doing things" and “Feeling depressed or hopeless”) and the last two items measure the frequency of anxiety symptoms ("Feeling nervous, anxious or on edge" and "Not being able to stop or control worry"). Each of the PHQ-4 items has four Likert-type response options, ranging from 0 = not at all to 3 = almost every day. The total score ranges from 0 to 12, where higher scores indicate a higher frequency of depressive and anxiety symptoms. For this study, the Spanish version was used, validated in Ecuador by López Guerra et al.55. The purpose of the PHQ-4 is not to make a final diagnosis or to monitor the severity of depression and anxiety but to perform a preliminary screening for depressive and anxiety symptoms.
The translation of the items of the PHQ-4 from Spanish to Quechua in its Collao variant from Puno (Peru) was conducted following the indications of Hernández et al.56. This included an initial translation performed by a group of translators, followed by back-translation performed by other translators without knowledge of the original questionnaire. In the initial translation, professional support from two translators was requested to perform the translation from Peruvian Spanish to Quechua in its Collao variant from Puno (Peru). The translators were Peruvian and had a command of Peruvian Spanish and an expert level in the Collao de Puno variant of Quechua. Independently, each translator presented his or her report together with a Peruvian psychologist whose native language was Quechua. They met to record approval from the review committee and to leave a written version of the items of the PHQ-4 in Collao de Puno Quechua. In the back-translation, two translators, different from those assigned in the first stage, were in charge of the back-translation (from Quechua into Peruvian Spanish). Each one presented his or her translation report, and together with the review committee led by a psychologist who spoke Spanish and Quechua in the Collao variant, they reviewed and compared the coincidences between the translated versions. It was verified that the items complied with the measurement of the constructs in the back-translated version; therefore, the final version of the items of the PHQ-4 in Quechua Collao de Puno was approved. Subsequently, a Quechua-speaking psychologist proficient in qualitative focus group data collection organized a face-to-face meeting with seven Quechua speakers of the Collao variant of Puno. These people were over 18 years of age, with five males and three females, all of whom had completed secondary school and could read Quechua and Spanish (bilingual). Initially, the participants were asked to read the written version of the survey and answer the two items as appropriate. Following the application of the items of the Quechua PHQ-4, the psychologist moderator invited participants to comment on the clarity of the items and their comprehension of everyday Quechua. There were no suggestions for changes or modifications of the Quechua version of the items. Table1 shows the original English, Spanish, and Quechua versions of the Collao de Puno variant of the PHQ-4.
Procedure
An online form was prepared using the Google Forms platform, which included an informed consent form, a sociodemographic form, and items of the scales. The objective of the study and informed consent were obtained in the initial part of the form. In addition, it was clearly indicated that only bilingual people with a minimum reading level in Quechua were allowed to complete the survey. The online form was administered between February and April, 2022. Two members of the research team monitored the application of the form for data collection. We collaborated with university students, fluent in Collao of the Puno Quechua variant, who were trained in the use and application of the instrument. The surveyors identified WhatsApp groups of parents of educational institutions, Christian churches, and groups of associations in Quechua-speaking rural communities to whom they presented the survey.
Ethical considerations
The Research Ethics Committee of the Unión Universidad Peruana approved this study (approval number: 2022-CEEPG-0000190). The study followed all the ethical principles of the Declaration of Helsinki for research in humans. All participants provided informed consent before starting the study. In addition, they were informed that participation was anonymous and voluntary, and that the information collected would be used only for research purposes. Participants could withdraw from the study at any time they saw a fit.
Data analysis
First, expert judges evaluated the evidence of content validity. Three psychologists (with a master's degree) participated, two of whom had at least two years of work experience in the clinical area and one in university teaching. All had experience in the treatment of depression in bilingual Quechua-speaking adults (Quechua and Spanish). The review of these experts had a qualitative component for each item, where they could provide suggestions on the cultural equivalence, contextualization, and relevance of the words in Quechua. Additionally, it has a quantitative component that consists of assigning a certain score for each item, based on which content validity was determined through the relevance, representativeness, and clarity of the items, and the Aiken V value was calculated for each item57.
In the Confirmatory Factor Analysis (CFA), because the items presented four response categories, Diagonally Weighted Least Squares with Mean and Variance corrected (WLSMV) estimator was used58. The criteria used to assess model fit were RMSEA (< .08), SRMR (< .08), CFI (> .95), and TLI (> .95)59,60. The internal consistency of the scale was assessed using Cronbach's alpha coefficient61 and the omega coefficient62. Values greater than .70 were considered adequate (63).
Item Response Theory (IRT) is an extension of the 2-parameter logistic model (2-PLM) for ordered polytomous items64, which is based on the Graded Response Model (GRM)65. Before assessing the item parameters, the model fit was estimated using the C2 test developed for ordinal items66. The fit criteria for assessing the GRM model were as follows: RMSEA ≤ .0567) and SRMSR ≤ .0568. For CFI and TLI values, criteria similar to those used in the SEM models (≥ .95) were used69. Regarding item parameters, discrimination (a) and difficulty (b) parameters were used. Information Curves for the Items and scale (IIC and TIC, respectively) were also calculated.
All statistical analyses were performed using the "lavaan" package70 for the CFA and the "mirt" package for GRM47. The RStudio environment for R was used in all cases.
Results
Content validity
Table 2 shows that the four items of the PHQ-4 are clear, relevant, and consistent, both at the sample (V > 0.70) and population levels (Li > 0.59).
Descriptive analysis
Table 3 shows that most of the participants had a low average score (< 1); that is, most of them chose the lower response categories of the items. The items also presented adequate Asymmetry and Kurtosis indices (As < ±2; Ku < ±7) (71). In the polychoric correlation matrix, all items presented moderate relationships.
Validity based on internal structure
In the present study, the model of two related dimensions was found to present acceptable fit indices for the data (χ2 = 8.72; df = 1; p = .003; RMSEA = .187 [CI90% .088 - .310]; SRMR = .028; CFI=.99; TLI=.93). Nevertheless, the relationship between the dimensions was high (.97), which suggests that the items in the sample studied do not differentiate between the two separate constructs of anxiety and depression, but would measure a single construct. Following this line of thought, the existence of a one-dimensional model was evaluated, which presented slightly higher fit indices than the previous model (χ2 = 8.41; df = 2; p = .015; RMSEA = .121 [CI90% .045 - .210]; SRMR = .029; CFI=.99; TLI=.97). In addition, those of the one-dimensional model had high factor weights (Figure 1). Considering this evidence, a one-dimensional model was used for subsequent analyses.
Reliability of the scale
The study found that the brief scale presented adequate reliability indices (α = .86; ω = .81), evidencing that the items presented adequate precision in measuring the construct.
Item Response Theory Model: Graded Response Model (GRM)
The results of the Confirmatory Factor Analysis (CFA) allow two main assumptions to be met: the existence of one-dimensionality and consequently local independence. Table 4 shows that the GRM model presents acceptable fit indices (C2[df] = 5.41[2]; p = .066; RMSEA = .088; SRMSR = .036; TLI = .97; CFI = .99). It is also appreciated that all item discrimination parameters are above the value of 1, generally considered good discrimination (Zickar, et al., 2002). Regarding the difficulty parameters, all the threshold estimators increased monotonically.
Figure 2 shows the Information Curves for the four items and the scale (IIC and TIC). The IIC shows that items D1 and D2 are the most accurate items of the scale for assessing latent traits. In addition, the TIC shows that the test is the most reliable (accurate) in the scale between -.5 and 2.5.
Discussion
Developing instruments with adequate characteristics to measure constructs involves a process that involves adapting them to different social and cultural conditions. Therefore, the present study aimed to adapt and validate the PHQ-4 in the Quechua-speaking population in its Collao variant in Puno-Peru. To the best of our knowledge, this is the first study to evaluate psychometric evidence of the PHQ-4 in a Quechua-speaking population.
The results of the content validity analysis of the PHQ-4 items reported that all of them were relevant, coherent, and clear to adequately represent the symptoms of anxiety and depression. Thus, the contents of the four items are adequate for their application to a sample of Quechua speakers. Previous analyses of the internal structure of the PHQ-4 have reported the presence of two factors: anxiety and depression(25, 32-37). However, this factor structure was not confirmed in the present study, as the CFA result supported a one-dimensional model, where anxiety and depression items formed a single latent variable of emotional problems. A previous study indicated that the two-factor model may not be completely adequate, as the covariance between the depression and anxiety factors was also high, and the discriminant validity was insufficient, suggesting that the factors are not sufficiently independent33. The lack of support for the two-factor structure may be explained by the characteristics of the sample. In this study, the participants were Quechua-speaking individuals, whereas in other studies, the sample consisted of members of the general population from European, Asian, and Latin American countries, or patients seen in primary care centers or psychiatric departments. In the present study, the mean PHQ-4 score was 2.89 (SD = 2.62), which is higher than that reported in German (1.76, SD = 2.06)32) and Colombian (1.27, SD = 2.01) general populations37. However, it was similar to that reported in a study with a sample of patients from a primary care center in the United States (2.5, SD = 2.8)25 and significantly lower than that reported in a study of psychiatric patients (6.52 SD = 3.45)33. It appears that the symptom burden and comorbidity of depression and anxiety, similar to patients attending primary care centers, may make the distinction between the two factors unclear.
However, considering the brevity of the PHQ-4, the reliability was relatively high. The values of the reliability coefficients were similar to the values previously reported by studies conducted in different populations(25, 32-37). This would indicate that the PHQ-4 is as accurate for measuring symptoms of depression and anxiety in the Quechua-speaking population as in other population groups.
Having a one-dimensional measure, and consequently presenting local independence, are assumptions needed to perform analyses based on the IRT64. One of the main results indicated that all PHQ-4 items had good discrimination ability. This indicates that the PHQ-4 items are useful for discriminating between Quechua-speaking participants with high and low levels of anxiety and depression. The difficulty parameters indicate that the items are comparatively difficult and that a higher presence of the latent trait (anxiety and depression) is needed to answer higher response categories (almost every day). The information curves allow a better understanding of the efficiency of the items in measuring symptoms of anxiety and depression in Quechua-speaking participants. In this sense, the PHQ-4 is accurate in assessing anxiety and depressive symptoms between -.5 and 2.5 of the latent variables, where the items measuring depression (D1 and D2) are the most accurate in measuring emotional problems among the four items.
Despite these important results, this study has limitations that can be addressed in future studies. First, participants were selected through non-probabilistic convenience sampling. This means that the sample is not representative and the findings cannot be generalized to the total population of Quechua speakers in its Collao variant in Puno-Peru. Second, as the main objective was to establish a preliminary and brief screening measure for anxiety and depression, the study included a sample of the general population, and there were no diagnostic measures to assess criterion validity. Therefore, these findings cannot be generalized to clinical populations. Previous studies have suggested that clinical versus general population-derived samples do not confound the importance of emotional problems72. Nevertheless, it is suggested that future studies incorporate clinical diagnostic standards to verify or modify what was reported in this study. Third, the study only analyzed data from speakers of the Quechua variant of Collao living in the northeastern high Andean city of Puno. However, it is not clear whether these findings can be generalized to other samples of speakers of other Quechua variants from the Amazonian, northern, central, and southern regions. It should be noted that the same item may present different nuances depending on the translation. This may generate different interpretations of the contents of anxiety and depression symptoms in different cultures and contexts. Therefore, future studies should examine the psychometric properties of the PHQ-4 in a broader set of Quechua-speaking populations. Fourth, other measures of depression and anxiety were not used in this study; therefore, there was no evidence of the relative performance of the PHQ-4 in comparison with other measures of anxiety and depression. Fifth, the test-retest reliability of the PHQ-4 was not assessed. This leads to the possibility that future studies may apply the DAS-4 at different time points to the same sample to assess the stability of the scores. Sixth, the PHQ-4 is a self-report measure that can assess the presence of social desirability. Therefore, it is advisable to consider, in the future, the use of structured diagnostic interviews to have complementary information on the presence of anxiety and depressive symptoms.
In conclusion, despite these limitations, the results indicated that the PHQ-4 had good psychometric properties in a sample of Quechua Collao speakers from Puno. Thus, it demonstrates the usefulness of using the PHQ-4 as a rapid, reliable, and valid primary screening measure for Quechua speakers in need of in-depth assessment and symptom monitoring for the diagnosis and treatment of anxiety and depression. In this sense, the study is unique in that it focuses only on a particular group of people within a particular context (i.e., Quechua speakers in Peru). However, much more information is needed on this population, as they are very vulnerable to mental health problems and lack access to health care. The scale makes it possible to report outcomes achieved in mental health settings and objectively evaluate the effectiveness of mental health care actions. However, it has been suggested that the application of short questionnaires, such as the PHQ-4, should be accompanied by assessments using longer instruments and interviews conducted by clinical psychologists73. This would provide a more useful systematic assessment to identify and eventually intervene in cases that are more prone to anxiety and depression. Caution should be exercised in using only brief screening questionnaires, such as the PHQ-4, without having reasonable plans to identify truly depressed or anxious individuals through a diagnosis confirmed by a qualified mental health professional34. Using instruments such as the PHQ-4 without the possibility of confirming the diagnosis may generate false positives, indicating, for example, the presence of increased levels of distress and dysphoria but not clinical depression74. However, in settings such as primary care, where time is a priority, the use of the PHQ-4 may be appropriate. In this sense, the speed of administration and ease of interpretation of PHQ-4 results represent important features for the implementation and evaluation of new screening procedures for depression and anxiety symptoms. A systematic review indicated that brief measures, such as the PHQ-4, may be suitable for screening and identifying mental health disorders that can support the development of early intervention75.