Skip to content

Input Data

SimPaths uses three types of data as input:

  1. The initial population to be evolved over time. Available here
  2. Donor populations used to impute the effects of tax and benefit policy. Available here
  3. Estimated parameters governing transition probabilities assumed by the model. Available here

Training data are provided for the first two of these data sets, while 'release' data are provided for the third data set.

The model has been designed to draw the initial population from data reported by the UK Household Longitudinal Study (UKHLS). The UKHLS, (sometimes referred to as Understanding Society), is the successor to the British Household Panel Survey, and is the principal general-purpose panel survey administered in the UK. Multiple initial populations are derived from the UKHLS, corresponding to different years of data reported by the survey (from 2011 to 2017), and used for model validation. The donor populations for tax and benefit imputations are derived from UKMOD and are based on data reported by the Family Resources Survey (FRS). These data include a wide range of benefit unit characteristics in addition to tax and benefit payments. SimPaths imputes tax and benefit payments from these data by matching simulated individuals to individuals described by donor populations. Parameters for the UK have been estimated on UKHLS data, Waves 1 to 8, and FRS (labour supply and social care, various years).

Training data are provided for the initial population (1) and the donor populations (2) because these data sources are drawn from publicly available sources that are subject to limitations by the respective data providers. The following sections describe how to generate 'release' data for these two data sets.

2. Obtain data for the initial population (for the UK)

In addition to training data, the model comes supplied with a set of Stata do files that have been written to extract input data from the UKHLS. These do files can be found in the model directory: SimPaths/input/InitialPopulations/compile/.

  1. Obtain the most recent version of the UKHLS survey UK data service (SN6614, in STATA's tab format). Further to this, you need to obtain the most recent version of the Wealth and Assets Survey (WAS) (SN7215, in STATA's tab format).
  2. Use Stata to open file 00_master.do, and edit global variables at the top of the file, save and run.
  3. Copy the csv files generated following (2) to model directory: SimPaths/input/InitialPopulations/.
  4. Run SimPathsStart, and select option "Load new input data for starting populations" from the Start-up Options window.

3. Obtain data for tax-benefit donors (for the UK)

SimPaths is designed to read in data describing tax-benefit payments generated by UKMOD.

  1. Obtain a copy of the most recent version of UKMOD from the CeMPA website.
  2. Obtain the most recently available "b" series of input data provided for UKMOD as described on the CeMPA website.
  3. Run desired system years described by (1) UKMOD, using the (2) "b" series dataset - note that the same input data set should be used for all system years. System runs can be performed directly in UKMOD or calling UKMOD from STATA, R, or Python using the respective connectors.
  4. Copy the files generated following (3) to model directory: SimPaths/input/EUROMODoutput/. Please note that it is required to provide UKMOD output files which include the base price year used by SimPaths (currently 2015). If no UKMOD output file is provided for the base price year, the initial database setup will fail.
  5. Run SimPathsStart, and selection option "Load new input data for tax and benefit systems" from the Start-up Options window.