SLS Census Linkage Launch Event – download slides
On Tuesday 4th November 2014, the SLS-DSU (supported by National Records of Scotland and CALLS Hub), held a launch event to announce the linkage of 2011 Census data to the Scottish Longitudinal Study.
The event was held at Royal College of Physicians, Edinburgh, and around 70 people attended to hear about the new data, as well as examples of how it could be used. The welcome was given by Prof Andrew Morris, Scottish Government Chief Scientist.
UPDATE: You can now download full audio + slide presentations here.
Adam Dennett (CeLSIUS and CASA, UCL) recently featured in an episode of The Global Lab and discussed his work with Census data and the Synthetic Data Estimation for the UK Longitudinal Studies (SYLLS) project.
You can hear or download the podcast on Soundcloud
On Thursday 6th March 2014 CALLS Hub organised a very successful launch event to mark the linkage of 2011 Census data to the UK LSs on behalf of the LSs, ONS, NISRA, NRS and ESRC. This was held at Church House, Westminster, and over the day a total of 120 people attended.
A special morning event was introduced by Prof Paul Boyle (CEO of ESRC), and the linkage officially announced by Sir Andrew Dilnot (Chair, UK Statistics Authority). We were honoured to hear how highly regarded the LSs are held.
You can see some of the tweets from the day in our Storify roundup, and speaker slides and handouts from the day can be downloaded below.
Morning session
- Delegate list (PDF 2MB)
- Programme (PDF 2MB)
- 2011 Census Data link to the LSs: The Potential for Policy and Influence – Dr Ian Shuttleworth, Director, NILS-RSU (PDF 2MB)
- Using the ONS LS for research on ethnicity and social mobility – Prof Lucinda Platt, Director, Millennium Cohort Study, Professor of Sociology, LSE (PDF 8MB)
Afternoon session
- Delegate list (PDF 2MB)
- Programme (PDF 2MB)
- Beta-test Posters (PDF 4MB)
- Introduction to the UK LS & Census 2011 Data Linkage – Dr Nicola Shelton, Director, CeLSIUS (PDF )
- New LS Developments and Official Announcement of CALLS Hub – Prof Chris Dibben, Director, Longitudinal Studies Centre Scotland, PI Census & Administrative data LongitudinaL Studies Hub (PDF MB)
- Synthetic Data Estimation for the UK Longitudinal Studies (SYLLS) – Dr Adam Dennett, Lecturer, CASA, UCL (PDF 6MB)
- Are we becoming more migratory? An analysis of internal migration rates, 1971- 2011 – Prof Tony Champion, Newcastle University (PDF 234KB)
- Social and economic transitions and their effect on young people’s health and social wellbeing – Dr Mark McCann, Queen’s University Belfast (PDF 23KB)
- Characteristics of and living arrangements amongst informal carers at the 2011 and 2001 censuses: stability, change and transition – Dr James Robards, University of Southampton (PDF 82KB)
- Does religious exogamy (mixed marriage) increase the risk of marital dissolution in Northern Ireland? – Dr David Wright, Queen’s University Belfast (PDF 105KB)
- Inter-cohort trends in intergenerational mobility in England and Wales: income, status, and class (InTIME) – Dr Franz Buscha, University of Westminster (PDF 27KB)
Adam Dennett, UCL
As we head into a new year, we draw closer to the end of the SYLLS project. Starting in April 2013, the project has been run as a joint venture between the three Longitudinal Studies Research Support Units (RSUs) and the CALLS-Hub, with the aim of generating Synthetic Longitudinal data which are not subject to the same access restrictions as the real Census-based longitudinal microdata for England, Wales, Scotland and Northern Ireland.
The project has been split between teams based at CeLSIUS at UCL and the SLS-DSU in Edinburgh / St Andrews. The London team have been tasked with generating the ‘Synthetic Spine’ dataset. This is a partial replication of the full set of individuals contained in the 1991 LSs of England and Wales, Scotland and Northern Ireland, who then were also enumerated in the 2001 Census. The replication is partial as we have not attempted to synthesise every variable contained in the LSs for every individual, rather we have focused on a selection of some of the most frequently requested variables in previous LS-based research projects (age, sex, ethnicity, health, births, deaths, geography).
In order to generate the synthetic spine dataset, we have used publicly available data from the 1991 Samples of Anonymised records (SARs) as our base. The SARs are similar to the LSs in that they are microdata records and so are prefect for this task. A bespoke microsimulation model has been built by Belinda Wu to generate the synthetic spine from the SARs data. We began with England and Wales: A baseline population for the 1991 synthetic LS was generated by constraining aggregated (local authority) area level from the SARs to similar area level data from the LS using the tried-and-tested iterative proportional fitting technique – individuals were then sampled from this new data set to build our synthetic LS population. Once the 1991 baseline population is created, transitional probabilities are calculated from the LS to age our simulated individuals on 10 years and give them the same characteristics that we would see for those LS members enumerated in both the 1991 and 2001 Censuses.
The England and Wales LS Synthetic Spine is now complete; we are currently working on finishing a similar dataset for the SLS and will soon be tackling the Northern Ireland LS. Northern Ireland is a slightly different case as the 1991 to 2001 link has not yet been completed, but as the NILS sample is around a quarter of the resident population, the aggregate distributions are likely to be very similar to the distributions for the full Census. We will therefore use the 1991 Census distributions to generate our 1991 baseline and calculate the transitions to 2001 using our microsimulation software as soon as the link project is complete.
While the London team have been beavering away on the synthetic spine, the team based in Scotland have been working feverishly on the other half of the synthetic project. The second half of the project is approaching the generation of synthetic data from a different angle entirely: rather than attempting to create a large, general use dataset, here we are tailoring synthetic data to the individual needs of the user. Very soon, if you formulate a project and submit a request to access data from any of the national LSs, you will be asked if you would like to also receive a bespoke, fully synthetic version of your specific data request to work with as you wish on your own computer – something which is not possible with the real data.
The bespoke data are generated using a new R package called ‘synthpop’ developed by Beata Nowok and Gillian Raab in the Scotland team. Synthpop is a multiple synthesis package which allows user support officers to quickly generate fully synthetic versions of the data requested by the user. The data are generated through a series of models which estimate the values of one variable from the values of all others in the dataset sequentially. One of the benefits of this approach is that the resulting data are statistically equivalent to the real data, despite containing no real values.
We are now in the process of testing the synthpop package, with the Edinburgh team coming to visit London and the ONS LS virtual microdata lab to train the CeLSIUS user support officers and test the package on different data. A similar visit to Belfast and the NILS-RSU ‘safe-setting’ is scheduled shortly after that.
On the 6th of March we will be very excited to launch both Synthetic data products at the UK LS 2011 Census Linkage Launch event, and we hope to be able to provide user access to both the synthetic spine and bespoke synthetic tabulations very shortly afterwards.