
  • Essay / Multiple Regression - 1569

    IntroductionFor this study, the researchers wanted to evaluate whether self-reported health behaviors and health knowledge are able to predict self-rated physical health, after controlling for the effects of gender and age. They further want to know which of the variables make a statistically significant contribution to the equation. The research was also of interest for research into the interaction between gender and health literacy, that is, the degree to which individuals are able to obtain, process and understand the information needed to make decisions. appropriate decisions regarding their health and the impact of this interaction on health. Data were collected from 350 people randomly selected from a dataset from a population study of health and health determinants. Health was measured on a scale of 1 to 10, where higher scores represent better health. Health behaviors include healthy eating, physical activity and relaxation and are measured on a scale of 1 to 15. Health knowledge is measured on a scale of 10 to 45. Gender and age in years were also collected from the respondents. This data analysis involved filtering the data for possible missing values, out-of-range values, univariate and multivariate outliers, and multicollinearity. Three variables used for this study contained missing values; both the system and the identified missing persons. These variables were health knowledge, physical activity and age in years, one case for each of these variables. Each of these missing values ​​was recoded with a missing value code of 999. The descriptive statistics produced for each of the variables used for the analysis revealed out-of-range values ​​for the healthy eating, physical activity and relaxation variables. These values ​​were also recoded with the missing value code 999. Tests for the presence of outliers were performed by generating a scatterplot matrix for all variables (Figure 1) and plots of Cook's distances ( Figure 2) and Mahalanobis distances (Figure 3). No cases indicate any particular cause for concern. On the Mahalanobis distance map there are no cases significantly larger than the others and on the Cook distance there are no cases with a distance greater than 1 that would indicate a point of influence. Multicollinearity was tested and no variable had a tolerance lower than 0.3. It is also necessary to check the regression assumptions to ensure that the results of the analysis are valid. The first assumption is that all variables are measured on a metric scale or that categorical variables are dichotomously coded. This is true for the data in this study. The second assumption is that each observation in the sample is independent of the other observations, the