Comments on four papers on synthetic data in Volume 32 Issue 1 the Statistical Journal of the IAOS
Raab, G. (2016) Statistical Journal of the IAOS, 32, 267 - 269 [SLS]
One of several explanations of why Homo Sapiens is the only surviving sub-species of the genus Homo is the extended length of our childhood and adolescence. The value of this extended maturation and developing period may be that it allows us to learn and carry out complex tasks. Like Homo Sapiens, methodology for synthetic data has had a long learning period. The idea of using synthetic data for disclosure control was con- ceived more than 20 years ago [1–3], but it was a fur- ther 10 years before the first papers describing how to do it appeared in the literature [4,5]. The subsequent decade was one of rapid development and innovation when the methodology was tested and expanded. The energy and enthusiasm for synthetic data of Reiter and his colleagues was responsible for many major de- velopments; see the monograph by Drechsler  for a review. Towards the end of synthetic data’s second decade real applications began to appear [7–9]. Two of the four substantial papers that deal with synthetic data in this issue [10,11] are examples of mature methodol- ogy, while the other two [12,13] deal with disclosure control, the aspect of synthetic data that is at an early stage in its development. My comments here are from the point of view of a practitioner looking for useful and workable ideas in this field. Our project to pro- vide data for the UK Longitudinal Studies (LSs) is re- ferred to in Vilhuber et. al.’s overview of international developments . More details of our methods and our synthpop package for R are available [15–17].