Data Analysis
We used Random Forests, a type of recursive partitioning method available in the ‘party’ package [24] in the programming language R version 3.0.1 [25]. While other methods such as CART are generally used for analyzing similar data, in our experience Random Forests have substantially outperformed methods based only on a single regression tree. We also chose to implement cforest in the ‘party’ package instead of using other packages (e.g. ‘rpart’ and ‘randomforest’) because it is appropriate to use with predictor variables of different types. A conditional variable-importance measure (function ‘varimp’) has recently been added to this package, which can give more reliable results when evaluating the importance of each variable if predictor variables are correlated [26]. This is important with the high number of correlated predictor variables in interview surveys. The ‘party’ algorithm first tests whether predictor variables are independent of each other and independent of the response variable. It then selects the single predictor variable with the strongest association to the response variable and assigns a p-value to the relationship. The data are then split into two nodes (groups of data) that are compared to the predictor variables in a repeat search for the next predictor variable with the strongest association. This portioning of the data continues until the assigned stop criterion.
Model parameters included 1,000 trees, and we set the number of randomly preselected predictor variables for each split (mtry) to four. An mtry of four was chosen because Strobel et al. [26] suggest using the square root of the number of variables. The stop criterion was set to the default by Strobl et al. [27] and is based on the univariate p-values. Before we interpreted the random forest variable importance rankings, we increased the number of trees from 20 to 500 to 1,000, and repeated the analyses specifying different values for random seeds. The results of significantly ranked variables were stable and the overall results were the same. Variables are usually considered informative if their variable importance value is above the absolute value of the lowest negative-scoring variable [26].
We included 15 factors as potential explanatory variables. Five variables were chosen to characterize attitudes towards dholes and included the level of agreement with statements based on a 1-5 Likert scale; three variables dealt with people’s familiarity with dholes; three variables were chosen to indicate people’s relationship to the forest; and the remaining variables were age, sex, level of schooling, and income (Table 1; Appendix I).
The degree to which people agreed that “We should eliminate dholes” was chosen as the response variable. The variable was an ordered factor of the Likert responses on a 1-5 scale indicating level of agreement with the statement. We were most interested in identifying variables correlated with this outright elimination statement, because we assumed that people with extremely negative attitudes would be most likely to act and have a detrimental impact on dholes.