To what extent is an American's income predictable? Here, I look at a subset of this question, using logistic regression to predict whether or not an individual lives below the poverty line. I focus on my home state of Pennsylvania as a smaller test case.
Bottom line: logistic regression gave me insights regarding how demographics and life histories influence whether a person lives below the poverty line. Predictions using this fit were decent, but not great.
The graph on the left shows the odds ratios and 95% confidence intervals for the logistic regression. An odds ratio greater than one means a person in that category is more likely to live below the poverty line, and similarly an odds ratio less than one means that it is less likely. If the confidence interval bars extending out from each data point cross the line at 1.0, that indicates that that effect is not significant.
Odds ratios are interpreted as the odds that, all other factors being equal, a person lives above the poverty line compared to a base case. I will specify the base cases for each category in the discussion below.
The figure to the right shows the relative importance of different variables in the logistic regression results, colored by category as above. For clarity, I show only the top ten predictors.
Living in a household without one's spouse (regardless of whether one has a spouse) was the most important factor in predicting poverty status. This result was surprising, particularly since this category includes unmarried partners living together. Education variables account for half of the top ten most important factors for determining the model coefficients. Length of home occupation, age, and disability status were also important.
We can assess the predictive capability of the logistic regression model several ways.
More details for nerds: