Estimating sub-national behaviour in the Danish microsimulation model SMILE

31-05-2019

This paper suggests the use of a combination of Principal Component Analysis (PCA) and classification by Conditional Inference Trees (CTREEs) when estimating transition probabilities depending on a large number of high-dimensional covariates, hence overcoming the curse of dimensionality.

Abstract

The SMILE model is a Danish dynamic microsimulation model, which forecasts demography, household formation, housing demand, socioeconomic and educational attainment, income, taxation, health, and labour market pensions. In the most recent version of the model, selected behavioural patterns are allowed to vary across the 98 municipalities in Denmark. In particular, this equips the model with a detailed description of sub-national moving behaviour, which is essential when seeking to identify geographic areas characterized by exodus and depopulation.

Modelling behavioural patterns by a large number of potentially high-dimensional covariates allows for a detailed description of individual behaviour. However, it simultaneously reduces the number of observations with identical characteristics, which leads to sparse data. Hence, introducing detailed sub-national behaviour significantly challenges the estimation of municipality dependant transition probabilities. This paper suggests the use of a combination of Principal Component Analysis (PCA) and classification by Conditional Inference Trees (CTREEs) when estimating transition probabilities depending on a large number of high-dimensional covariates, hence overcoming the curse of dimensionality. Keywords: sub-national population projections, curse of dimensionality, data mining, conditional inference trees, principal component analysis.