Longitudinal Classification and Predictive Modeling for Historical CPS Data Using Random Forests

Cecile K. Johnson	Hannah E. Schmuckler
MS Data Science, University of Virginia ‘22	MS Data Science, University of Virginia ‘22
GitHub	GitHub
LinkedIn	LinkedIn

Our Work

Paper
Our Libra Dataverse Repository contains trained forests and over 450 million predictions. (README)
The machine readable Industry and Occupation crosswalks between census codes encompassing the 70s to current are available for other researchers. (GitHub)

Example Code

Building Random Forests: Industry and Occupation
Making Predictions: Industry and Occupation
Vignette: Predicting Industry Using Random Forests Built with Crosswalked Data
Vignette: Utilizing Resources from Random Forest CPS Predictions

Appendix

Generating Random Forests and Predictions Using Crosswalked Data

Other

In the course of our work, we cleaned the “Treiman File” dual-coded Industry & Occupation dataset (1970-1980) for use. We make it available for the benefit of other researchers.

Treiman Dual-Coded File