Conditional inference trees in dynamic microsimulation - modelling transition probabilities in the SMILE model

03-12-2013

Data mining using conditional inference trees (CTREEs) is found to be a useful tool to quantify a discrete response variable conditional on multiple individual characteristics and is generally believed to provide better covariate interactions than traditional parametric discrete choice models, i.e. logit and probit models.

Abstract

Determining transition probabilities is a vital part of dynamic microsimulation models. Modelling individual behaviour by a large number of covariates reduces the number of observations with identical characteristics. This challenges determination of the response structure. Data mining using conditional inference trees (CTREEs) is found to be a useful tool to quantify a discrete response variable conditional on multiple individual characteristics and is generally believed to provide better covariate interactions than traditional parametric discrete choice models, i.e. logit and probit models.

Deriving transition probabilities from conditional inference trees is a core method used in the SMILE microsimulation model forecasting household demand for dwellings. The properties of CTREEs are investigated through an empirical application aiming to describe the household decision of moving from a number of covariates representing various demographic and dwelling characteristics.

Using recursive binary partitioning, decision trees group individuals’ responses according to a selected number of conditioning covariates. Recursively splitting the population by characteristics results in smaller groups consisting of individuals with identical behaviour. Classification is induced by recognized statistical procedures evaluating heterogeneity and the number of observations within the group exposed to a potential split. If a split is statistically validated, binary partitioning results in two new tree nodes, each of which potentially can split further after the next evaluation. The recursion stops when indicated by the statistical test procedures. Nodes caused by the final split are called terminal nodes. The final tree is characterized by a minimum of variation between observations within a terminal node and maximum variation across terminal nodes. For each terminal node a transitional probability is calculated and used to describe the response of individuals with the same covariate structure as characterizing the given terminal node. That is, if a terminal node consists of single males aged 50 and above living in rental housing, individuals with such characteristics are assumed to behave identically with respect to moving when transitioning from one state to another.