Documenting Cross-National Comparability of Longitudinal Surveys: The Online Codebook and Analysis of the Generations and Gender Programme

Arianna Caporali, Institut national d’études démographiques (INED)

What is it to document cross-national longitudinal surveys? In international survey projects, metadata are paramount to evaluate issues of conceptual or methodological equivalence between country datasets. They are crucial also to document subsequent survey waves. This paper presents the ways in which metadata are provided in the framework of the Generations and Gender Programme (GGP). This is a longitudinal demographic survey of 18-79 year olds in Europe and beyond. Survey instruments and guidelines are either adapted to the different national contexts or incorporated into existing surveys. The specific challenge of documenting GGP data is the need to combine information from existing surveys with information on the harmonization process, across different waves. To face this challenge, metadata are provided in compliance with the Data Documentation Initiative (DDI), an international standard for documenting social science data, and are made available in an online codebook and analysis tool through the software package Nesstar. The paper illustrates the DDI elements chosen to document cross-national comparability of GGP surveys and discusses the experience of the GGP team in providing the relevant information. The paper concludes with an outlook on possible developments. In the future, metadata preparation from the beginning of country fieldworks may help optimizing the data documentation procedure.

Survey data can be usable only if accompanied by comprehensive metadata. In international surveys, metadata are paramount to document data quality and cross-national comparability. Metadata are even more important in the framework of longitudinal surveys, where subsequent waves need to be described. Metadata preparation involves both data producers and survey data archives. To ensure reusability of survey data in cross-national research, survey data archives prepare metadata following international standards. Today, the standard called Data Documentation Initiative (DDI) is widely used in social sciences, both for national surveys and for cross-national comparative surveys. It is often used in connection with Nesstar (Networked Social Science Tools and Resources), a user-friendly software which can be used to prepare survey metadata according to the DDI standard and to disseminate surveys online.

After an overview of the DDI standard and of the software Nesstar, the paper presents the GGP methodology, the challenges that this methodology poses for data documentation, and how they are overcome. The GGP is a demographic survey on 18-79 year olds in Europe and beyond, run by a consortium of research institutes. The GGP central coordination team is based at the Netherlands Interdisciplinary Institute – NIDI, in The Hague. Among the other institutes, the French Institute for Demographic Studies (INED) acts as the GGP data archive. The GGP is based on a relatively decentralised management model. This implies that the survey instruments and the fieldwork guidelines can either be adapted by the national teams to the different national contexts, or partly incorporated into existing surveys. It is therefore necessary to provide extensive documentation on both country methodological specificities and country deviations from the standard questionnaire. To ensure cross-national comparability, the GGP also needs to implement considerable post hoc harmonization of data that requires to be documented together with general information on the GGP survey. To face these challenges, metadata preparation is structured in two procedures: one concerning the documentation of fieldwork methodologies, and the other one concerning the documentation of variables. Especially the preparation of country-specific metadata requires a meticulous and often time-demanding work. Each single variable has to be checked and country deviations (e.g., country-specific variables and response categories) fully documented. All the metadata are organised in compliance with the DDI standard and disseminated online through the software Nesstar, in the GGP Online Codebook and Analysis (

The paper illustrates the DDI elements chosen to document GGP surveys and the ways in which the documentation is organised in the Online Codebook and Analysis. The functionalities of the Nesstar software are used to provide in a convenient way all the great amount of metadata necessary to document GGP surveys. Examples of some of the analytical functionalities of Nesstar are also given. Data documentation is organized in three different data files. First, to provide users with cross-national overviews of data, pooled datasets are published for each available GGP survey wave. Second, in order to fully account for national deviations in fieldwork methodologies, country-specific data files are also provided. Third, the so-called ‘Variable Availability’ data file is aimed to document availability of variables across countries and waves, so to provide users with insights into country compliance to the GGP standard questionnaires. Metadata are organised along two main DDI fields. The first field is called ‘Metadata’. It contains information on the survey in general and country-specific metadata on methodology and processing (e.g., sampling and weighting procedures, data collection mode, characteristics of data collection situations, response rates, action to minimize losses between waves). The second field is called ‘Variable Description’ and contains detailed description of each variable. For each variable, there are detailed metadata necessary to understand how to use the data and their cross-national comparability. This includes: the question texts of the standard questionnaire, a descriptive text including country-deviations from the question and/or from the response categories of the standard questionnaire, and the variable distribution. If applicable, there is also the universe (i.e., the subset of respondents to whom the question was asked), and, in case of derived variables, the explanations of the calculation method.

In the future, it is planned to implement greater monitoring of the data collection by the central coordination team. This may help reducing country deviations from the standard questions and, thus, also the efforts to prepare country-specific metadata. In addition to that, it is planned to start the preparation of country-specific metadata from the beginning of the fieldwork. This implies data check, metadata corrections, and variable documentation while the fieldwork is still on-going. If implemented, this new metadata preparation may increase the efficiency and optimize the whole data documentation procedure.

Presented in Session 1233: Posters