Longitudinal Classification and Predictive Modeling for Historical CPS Data Using Random Forests
Cecile K. Johnson | Hannah E. Schmuckler |
---|---|
MS Data Science, University of Virginia ‘22 | MS Data Science, University of Virginia ‘22 |
GitHub | GitHub |
Our Work
- Paper
- Our Libra Dataverse Repository contains trained forests and over 450 million predictions. (README)
- The machine readable Industry and Occupation crosswalks between census codes encompassing the 70s to current are available for other researchers. (GitHub)
Example Code
- Building Random Forests: Industry and Occupation
- Making Predictions: Industry and Occupation
- Vignette: Predicting Industry Using Random Forests Built with Crosswalked Data
- Vignette: Utilizing Resources from Random Forest CPS Predictions
Appendix
Other
In the course of our work, we cleaned the “Treiman File” dual-coded Industry & Occupation dataset (1970-1980) for use. We make it available for the benefit of other researchers.