A project that uses health data to see if it can predict household income levels, exploring how health and economics are related.

Role

Developer

Timeline
January 2022

Technologies

  • Python
  • Pandas
  • Matplotlib
  • Seaborn
  • Scikit-learn

Tools

  • Jupyter Notebook

Background

Learning From Health was created to explore the connection between health indicators and income levels. The project used real datasets from the CDC to see if conditions like diabetes or cholesterol could predict a person's economic status. It provided an opportunity to work with large-scale public health data to identify patterns between physical well-being and financial status.

Solution

The analysis resulted in a predictive model that looks at health statistics and geographic data to estimate income. The model demonstrated a clear link between specific health conditions and earnings. Efforts were focused on data cleaning and visualization to make the relationships easier to understand, highlighting how factors like location influence the results.

Process

The process began with researching CDC health datasets and performing extensive data cleaning. Various charts and graphs were used to identify key features, followed by testing different modeling approaches. Once a reliable method was found, the findings were documented in a report to explain what the data revealed about the relationship between health and economics.

Final Product

Learning From Health final product

Impact

The project identified a strong correlation between health indicators and economic status. It provided insights into how health issues vary across different regions and income levels, serving as a practical exploration of using data to understand complex public health challenges.

Reflection

Building this was an exercise in handling large datasets that contained missing or inconsistent information. It reinforced that data preparation is just as critical as the analysis model itself. While the current patterns are clear, future work could involve more diverse data to see if these trends remain consistent across a broader population.