Synthetic data estimation for the UK Longitudinal Studies: an introduction to the SYLLS project
Dennett, A., Wu, B. & Nowok, B. (2013) BSPS Annual Conference 2013, University of Swansea, UK, 9 - 11 September 2013 [SLS][ONS LS][NILS][CALLS]
The England and Wales Longitudinal Study (LS), Scottish Longitudinal Study (SLS) and Northern Ireland Longitudinal Study (NILS) are incredibly rich micro-datasets linking census and other health and administrative data (births, deaths, marriages, cancer registrations) for individuals and their immediate families across several decades. Whilst unique and valuable resources, the sensitive nature of the information they contain means that access to the microdata is restricted, limiting the user base. The SYLLS project will develop synthetic data which mimics the real longitudinal data but crucially will not be subject to the same access restrictions as the national LSs. In this paper we will introduce two different but complementary methods that we will be adopting to generate the synthetic data – microsimulation and multiple imputation. Microsimulation will be used to generate a synthetic LS ‘spine’, mimicking the full population of individuals in the LSs but for a limited set of core variables, transitioning between 1991 and 2001. Multiple Imputation will be used to generate bespoke synthetic data extracts which match precisely the requirements of individual research projects. This paper will report on the methodological progress to date, issues and prospects for the new synthetic datasets.