Families in Comparison

Ingrid van Dijk, Radboud University Nijmegen
Rick Mourits, Radboud University Nijmegen
Niels van den Berg, Leids Universitair Medisch Centrum
Kees Mandemakers, Erasmus University
P. Eline Slagboom, Leids Universitair Medisch Centrum
Angelique Janssens, Maastricht University

In demographic research large scale individual data have become increasingly available. The quality of these datasets has seldom been assessed beyond comparisons with external sources such as lifetables, as opportunities for such assessments are rare. In this paper, we explore the quality of two Dutch demographic datasets by comparing life course reconstructions and family reconstitutions for individuals found in two different types of datasets. First, we use the HSN (Historical Sample of the Netherlands) which consists of a sample from the nationwide population register and follows life courses of a group of research persons in a historical context. Second, we use LINKS (Linking System for family reconstitution), based on vital event registration certificates. As the HSN is based on a sample of birth certificates and LINKS contains all vital events occurring in the province of Zeeland, individuals found in the HSN who are born in the province of Zeeland can be found in LINKS.

We use 495 persons who are found in both the HSN and in LINKS from the birth cohort 1863 to 1873. This paper constitutes a first attempt to compare life course reconstructions and family reconstitutions in demographic datasets based on different sources and constructed using different methods. Our purpose is threefold: to assess the quality of the datasets; to assess the usefulness of the data for different research questions including research on life spans and mortality, marriage behavior and fertility, and to determine how cases which are as complete as possible can be selected for analysis in a careful manner.


Within the Historical Sample of the Netherlands, birth certificates, personal cards and population registers have been used to recreate entire households of research persons, including information about their siblings, parents, migration movements, and characteristics such as socioeconomic status and religion. The data include events occurring to co-resident first degree relatives and spouses. Information on kin is recorded from the perspective of the research person, and only life courses of siblings and parents who co-reside with the research person are registered. The resulting data contain relatively complete life courses of individuals from the 19th century. At the same time, questions with regard to intergenerational similarities and change, kin networks, and life course similarities between siblings cannot be answered due to the design of the database. For such questions, another database is available for The Netherlands: LINKS.

LINKS (Mandemakers & Laan, 2017) aims to reconstruct all families in the Netherlands which are found in the vital event certificates from the Dutch civil registry, which was implemented nationwide in 1812. In the LINKS database, indexes of certificates of birth, marriage and death have been linked together, using the names of the research person and his or her parents. Individual life courses are reconstructed and linked to partners, parents and children, resulting in multigenerational pedigrees in which not only first degree relatives but also other relatives can be followed over time. At the same time, and in contrast to the HSN, LINKS does not contain information on addresses, co-residence of kin, and religion.

Although possible sources of bias are commonly addressed in demographic research using historical datasets, their influence on life course reconstructions and family reconstitutions has not been assessed in a systematic fashion. In this paper, we will assess the extent to which characteristics of two different types of databases affect estimations of key demographic and socioeconomic indicators. For an overview of the expected bias in the HSN and LINKS, see Box 1. We believe that the main source of bias in LINKS results from failed matches between vital event certificates, which causes problems in reconstructing individual life courses, linking marriage lines, and retracing progeny. First, mismatches can cause false linkages and failed linkages might result in underestimated sibship size. Second, certificates are linked within provinces, so persons are lost to follow up if they migrated in to or out of the province. As a result, mortality in early life is most likely measured quite accurately, but death and marriage certificates for events happening later in life are more often lacking for many individuals due to migration, which may cause an underestimation of later-life mortality rates. Finally, key indicators such as place of residence and socioeconomic status are only observed in concordance with vital events, i.e. if an individual marries or if children are born. Most of these events occur relatively early in life, so that socioeconomic status during the life course can easily be underestimated.

In the HSN, the main focus lies on the life course of individuals and not necessarily on reconstructing families. As a result, there are no observations of events occurring to a family before the sampled individual was born into a family. The implication is that if children born and died in a family before the index person was born, and the family moved afterwards, they are not mentioned in the population register at the address where the index person lived and not recognized in the dataset. Thus, the recognition of net fertility (surviving siblings) may be accurate in the HSN, but the recognition of all fertility (all children ever born) less so. Furthermore, stillbirths and marriages of parents were not registered in population registers, and changes in the household before and after the index person left the household are not included in the HSN, possibly limiting opportunities for life course research and intergenerational transfers of longevity, mortality, and fertility using the database. However, it is unknown to what extent these characteristics result in bias in the database.

Earlier research has tested whether life course reconstructions based on civil certificates reflect those based on population register data by comparing information on migration movements from civil certificates in the HSN to migration movements as known in the original data base. Vital-event related data was shown to reflect continuous observation through population registers rather well, at least for migration events (Adams, Kasakoff, & Kok, 2002). Here, we will investigate to what extent that also applies to socioeconomic status and demographic indicators such as size of the sibship, number of marriages, and early and later life mortality.

Presented in Session 1069: Data and Methods