Research Blog

Hear an interview with Adam Dennett on Census data and Synthetic Data

avatars-000040013685-988ct8-t200x200Adam Dennett (CeLSIUS and CASA, UCL) recently featured in an episode of The Global Lab and discussed his work with Census data and the Synthetic Data Estimation for the UK Longitudinal Studies (SYLLS) project.

You can hear or download the podcast on Soundcloud

Published on May 15th, 2014

Are we becoming more migratory? ONS LS beta project features in The Economist

Fiona Cox, CALLS Hub 

The Economist recently featured work from an ONS LS beta-test project by Tony Champion and Ian Shuttleworth.  This project used the new Census 2011 data in the ONS LS to analyse patterns of house moves in England and Wales over the 40 year period from 1971-2011.

Building on previous work in the area, the authors used 10 year change of address data from one census to the next to explore how patterns of migration have changed across time, and whether these patterns are different for certain types of move (eg moves of over 50km) or for certain groups in the population (eg, those with university degrees, or owner-occupiers).

The results suggest that almost all subgroups of people in England and Wales move home less frequently now than in the past, though this is less marked for moves over longer distances.   Possible explanations proposed include: higher rates of car-ownership which permits commuting to work; and increases in the proportion of people who own their home, and hence find it more time-consuming and costly to move.

You can read the article in The Economist, and also download Tony Champion’s slides about the work, which he presented at the recent UK LS Census Linkage Launch Event.

Published on April 3rd, 2014

AQMeN Advanced Methods Taster Event, 27th March 2014

Kev photo AQMeNKevin Ralston, SLS-DSU

Yesterday (27th March) I attended the Advanced Methods Taster Event hosted by AQMeN, in the Grosvenor Hilton in Glasgow, where the Scottish Longitudinal Study (SLS) had a stand. The event was highly popular with around 100 delegates subscribed to attend.

Sessions included introductory talks from experts on subjects such as: Longitudinal Data Analysis (Prof Vernon Gayle), Multiple Imputation (Dr Markus Klein), Structural Equation Modelling (Dr Jan Eichhorn) and Statistical Models for Count Data (Dr Leslie Humphreys).

This meeting represented a great opportunity to raise awareness of the SLS amongst potential users from all kinds of background. During the day I managed to have conversations with people typical to the research community such as PhD students, academic researchers and local government workers, as well as people with more varied interests in data such as a corporate lawyer. It is one of the great strengths of AQMeN that the network incorporates people from different circumstances but who share an interest in cutting edge methods and ground-breaking research. This enabled me to discuss the merits of the SLS as a resource with potential users interested in all kinds of substantive subject areas, from a health researcher looking at cervical cancer, to another interested in relating business start-up rates to individual level factors.

Kev Ralston

Published on March 28th, 2014

UK LS Census Linkage Launch Event – slides and handouts

On Thursday 6th March 2014 CALLS Hub organised a very successful launch event to mark the linkage of 2011 Census data to the UK LSs on behalf of the LSs, ONS, NISRA, NRS and ESRC.  This was held at Church House, Westminster, and over the day a total of 120 people attended.

A special morning event was introduced by Prof Paul Boyle (CEO of ESRC), and the linkage officially announced by Sir Andrew Dilnot (Chair, UK Statistics Authority).  We were honoured to hear how highly regarded the LSs are held.

You can see some of the tweets from the day in our Storify roundup, and speaker slides and handouts from the day can be downloaded below.

Morning session

Afternoon session

Published on March 17th, 2014

Synthetic Data for the UK Longitudinal Studies – SYLLS

Adam Dennett, UCL

As we head into a new year, we draw closer to the end of the SYLLS project. Starting in April 2013, the project has been run as a joint venture between the three Longitudinal Studies Research Support Units (RSUs) and the CALLS-Hub, with the aim of generating Synthetic Longitudinal data which are not subject to the same access restrictions as the real Census-based longitudinal microdata for England, Wales, Scotland and Northern Ireland.

The project has been split between teams based at CeLSIUS at UCL and the SLS-DSU in Edinburgh / St Andrews. The London team have been tasked with generating the ‘Synthetic Spine’ dataset. This is a partial replication of the full set of individuals contained in the 1991 LSs of England and Wales, Scotland and Northern Ireland, who then were also enumerated in the 2001 Census. The replication is partial as we have not attempted to synthesise every variable contained in the LSs for every individual, rather we have focused on a selection of some of the most frequently requested variables in previous LS-based research projects (age, sex, ethnicity, health, births, deaths, geography).

In order to generate the synthetic spine dataset, we have used publicly available data from the 1991 Samples of Anonymised records (SARs) as our base. The SARs are similar to the LSs in that they are microdata records and so are prefect for this task. A bespoke microsimulation model has been built by Belinda Wu to generate the synthetic spine from the SARs data. We began with England and Wales: A baseline population for the 1991 synthetic LS was generated by constraining aggregated (local authority) area level from the SARs to similar area level data from the LS using the tried-and-tested iterative proportional fitting technique – individuals were then sampled from this new data set to build our synthetic LS population. Once the 1991 baseline population is created, transitional probabilities are calculated from the LS to age our simulated individuals on 10 years and give them the same characteristics that we would see for those LS members enumerated in both the 1991 and 2001 Censuses.

The England and Wales LS Synthetic Spine is now complete; we are currently working on finishing a similar dataset for the SLS and will soon be tackling the Northern Ireland LS. Northern Ireland is a slightly different case as the 1991 to 2001 link has not yet been completed, but as the NILS sample is around a quarter of the resident population, the aggregate distributions are likely to be very similar to the distributions for the full Census. We will therefore use the 1991 Census distributions to generate our 1991 baseline and calculate the transitions to 2001 using our microsimulation software as soon as the link project is complete.

While the London team have been beavering away on the synthetic spine, the team based in Scotland have been working feverishly on the other half of the synthetic project. The second half of the project is approaching the generation of synthetic data from a different angle entirely: rather than attempting to create a large, general use dataset, here we are tailoring synthetic data to the individual needs of the user. Very soon, if you formulate a project and submit a request to access data from any of the national LSs, you will be asked if you would like to also receive a bespoke, fully synthetic version of your specific data request to work with as you wish on your own computer – something which is not possible with the real data.

The bespoke data are generated using a new R package called ‘synthpop’ developed by Beata Nowok and Gillian Raab in the Scotland team. Synthpop is a multiple synthesis package which allows user support officers to quickly generate fully synthetic versions of the data requested by the user. The data are generated through a series of models which estimate the values of one variable from the values of all others in the dataset sequentially. One of the benefits of this approach is that the resulting data are statistically equivalent to the real data, despite containing no real values.

We are now in the process of testing the synthpop package, with the Edinburgh team coming to visit London and the ONS LS virtual microdata lab to train the CeLSIUS user support officers and test the package on different data. A similar visit to Belfast and the NILS-RSU ‘safe-setting’ is scheduled shortly after that.

On the 6th of March we will be very excited to launch both Synthetic data products at the UK LS 2011 Census Linkage Launch event, and we hope to be able to provide user access to both the synthetic spine and bespoke synthetic tabulations very shortly afterwards.

Published on January 10th, 2014

British Society of Population Studies (BSPS) Conference, Swansea 2013

Kevin Ralston, SLS-DSU

This was my first time at the British Society of Population Studies (BSPS) conference. This is somewhat of an indictment considering my PhD examined the timing of first birth in Scotland. However, it was not a conference that was on the radar of my research group at the University of Stirling, which was more focussed on the field of social stratification, as a result I had overlooked the BSPS, although I was lucky enough to attend the BSPS postgraduate conference, popfest 2010, when it was hosted at St Andrews. Therefore, it was with some enthusiasm that I looked forward to this year’s conference hosted at the University of Swansea.

A little time has passed since the conference now and my view would be that the event was well organised and provided a very high standard of scientific research. As I would have expected something is provided for everyone with an interest population studies. Sessions on fertility, mortality, methodology, migration, ageing and historical demography, to point to just some of the themes of interest, mean that for anyone in the UK, and beyond, whose work is related to the subject areas covered by the BSPS have a strong incentive to generate a lasting connection with the Society of Population Studies. One delegate who I talked to was particularly complementary of the poster session which was organised around drinks and food. In contrast to other conferences where posters might be peripheral to the event, he felt this really brought the posters into the heart of the meeting. This is a nice touch as posters take as much time and energy to produce as other forms of dissemination and provide valuable insight into what research is going on.

Happily the Longitudinal Studies Centre Scotland (LSCS) were well represented across the sessions, particularly by Dr Beata Nowok who presented two talks and a poster as well as chairing a session. Indeed the Tuesday morning parallel sessions saw a triple clash for our Director, Dr Chris Dibben, with three talks involving LSCS researchers based on work involving Dr Lee Williamson, Dr Tom Clemens, Professor Gillian Raab, Dr Zhiqiang Feng and me all competing for attention. Also, the first CALLS Hub training event took place introducing the three Longitudinal Studies to attendees. Therefore the BSPS continues to be a particularly successful conference from the point of view of the LSCS.

The University of Swansea campus provided a practical backdrop to the meeting and its situation on picturesque Swansea bay gave delegates the chance to catch some fresh seaside air on a beach that was two minutes’ walk away, and the town of Mumbles at one end of the bay, with Port Talbot at the other. Even the weather played its part with sunshine throughout. All in it was a very enjoyable conference.

Link to the conference website:

http://www.lse.ac.uk/socialPolicy/BSPS/annualConference/2013-Conference—Swansea/2013-Conference—Swansea.aspx

Link to the programme:

http://www.lse.ac.uk/socialPolicy/BSPS/annualConference/2013-Conference—Swansea/Complete-programme-with-timetable.pdf

Kev Ralston

Published on September 24th, 2013

A Prezi introduction to CALLS Hub

We recommend clicking the button in the bottom right corner to view full screen. You can also view the Prezi at Prezi.com

Published on September 11th, 2013

SLS now linked to pollution and weather data

The Scottish Longitudinal Study has recently been expanded to include an extensive set of environmental data. Using data which is freely available from the Met Office and DEFRA. SLS researchers can now investigate environmental effects by socioeconomic characteristics in a way not previously possible.

The linkage was originally created as part of a joint project of the SLS and the Scottish Health Informatics Programme – see the SHIP website for more background information.

Weather data is currently available as monthly averages from January 1981, though it is possible to explore some variables from much earlier if needed. Rather than being based on weather station data, the variables report on 5x5km grids covering the whole of Scotland. Measurements include:

  • Temperature
  • Frost
  • Sunshine
  • Precipitation
  • Cloud cover

Pollution measurements are based on 1x1km grids, again covering the whole of Scotland. Annual averages are currently available for:

  • Carbon monoxide (2001-2008)
  • Nitrogen oxide (1994-2008)
  • Ozone (1994-2005)
  • Particulate matter < 10 microns (1994-2008)
  • Particulate matter < 2.5 microns (2002-2008)
  • Sulphur dioxide (1994-2008)

These weather and pollution data open up several new fields of SLS-based research – including environmental health and environmental justice – as well as extending the Study’s potential for public health research. And for environmental researchers, the linkage to the SLS allows them to control for known confounding factors such as socioeconomic status. The SLS can also offer the large sample sizes required in order to detect what are often relatively small effects.

The use of grid-based rather than station-based data is particularly useful for Scotland, which contains many sparsely populated areas that would not otherwise be covered. The increase in spatial coverage afforded by the grid-based approach also provides increased spatial variability. This means researchers can look at annual average exposures in a large number of small areas rather than focusing on the accumulated exposure of people living in a more concentrated area.

Data from the Scottish Air Quality Monitoring Network SAQMN can also be added in to the SLS. This provides hourly time series data from air pollution monitoring stations across Scotland. These data can be used on their own, or together with the grid-based data to improve exposure estimation. More information about SAQMN can be found on their website.

For an example of how the data have been used, see SLS Project 2007_011: “Time-space geographies and exposure to air pollution: examining the impact of varying exposure to air pollution on the health of adults and birth outcomes”.

The SLS team is also exploring other environmental data linkages, most notably to Road Network Data. This dataset can be examined by vehicle type, and would, for example, allow analysis of the effects that living close to busy roads can have on health. Other potential linkages are with levels of radioactive pollution (e.g., radon) and proximity to landfill sites.

For more detailed technical information about the weather and pollution data available, please explore the Met Office, DEFRA and SAQMN links below:

If you are interested in using these or any other environmental dataset in combination with the Scottish Longitudinal Study, please contact us to discuss your ideas.

Published on July 11th, 2013

QUICK DATA DICTIONARY SEARCH

Recent News

Upcoming Events

Sorry, there are currently no upcoming Events.

Latest Tweets