Conference Agenda

Session
Data quality management in the ESS at the data and post-data collection stage
Time:
Monday, 08/July/2024:
10:00am - 11:30am

Session Chair: Ole Petter Ovrebo
Session Chair: Joost Kappelhof
Location: C401, Floor 4

Iscte's Building 2 / Edifício 2

Session Abstract

For more than 20 years, one of the main aims of the European Social Survey (ESS) has been to provide researchers, policy makers and the wider community high quality data measuring change (and stability) over time in Europe. A significant proportion of the quality measures in contemporary cross-national surveys such as the ESS is implemented at the early stages of the survey life cycle (the design phase), be it related to questionnaire design, sample design, fieldwork plans, translation or more technical aspects of the data specifications.

While emphasizing the importance of these contributions to overall data quality, this session aims to highlight the data quality management in the ESS at the data collection and post-data collection stage. This may include well-known survey disciplines such as fieldwork monitoring and data processing, but also user contributions to data quality through what may be dubbed “the life cycle feedback-loop” in which exploration and analyses by data users result in new and improved versions of data and metadata, traditionally an important, if less communicated, part of the validation and quality enhancement of the ESS and surveys in general.

Data processing, from interim data quality checks to post-collection data cleaning may often appear as a black box to researchers and other end data users. However, the ESS prides itself on its transparency via thorough documentation and communication. Hence, the proposed session aims to present in detail our data processing and quality control procedures, clarifying which procedures are used, and examine their impact on data quality. The session is primarily focused on interim and post-collection data management at the central level but we would also invite papers dealing with related issues on a decentral (country) level. Furthermore, we would welcome papers highlighting data user contributions to data quality in the ESS. Papers related to all these issues from other surveys would also be relevant to this session.


Presentations

Data quality management in the ESS at the data and post-data collection stage

Joost Kappelhof1, Paulette Flore1, May Doušak2, Roberto Briceno-Rosas3

1SCP - The Netherlands Institute for Social Research; 2University of Ljubljana; 3GESIS - Leibniz Institute for the Social Sciences

For more than 20 years, one of the main aims of the European Social Survey (ESS) has been to provide researchers, policy makers and the wider community high quality data measuring change (and stability) over time in Europe. A significant proportion of the quality measures in contemporary cross-national surveys such as the ESS is implemented at the early stages of the survey life cycle (the design phase), be it related to questionnaire design, sample design, fieldwork plans, translation or more technical aspects of the data specifications.

While emphasizing the importance of these contributions to overall data quality, this paper aims to highlight the data quality management in the ESS at the data collection and post-data collection stage. This may include well-known survey disciplines such as fieldwork monitoring and data processing, but also user contributions to data quality through what may be dubbed “the life cycle feedback-loop” in which exploration and analyses by data users result in new and improved versions of data and metadata, traditionally an important, if less communicated, part of the validation and quality enhancement of the ESS and surveys in general.

Data processing, from interim data quality checks to post-collection data cleaning may often appear as a black box to researchers and other end data users. However, the ESS prides itself on its transparency via thorough documentation and communication. Hence, the paper aims to present in detail our data processing and quality control procedures, clarifying which procedures are used, and examine their impact on data quality.



Providing excellent data: the importance of transparency in data processing and dissemination.

Gyrid Bergseth

Sikt, Norway

The ESS prides itself in always implementing the highest quality standards in all elements of the survey lifecycle. Much of the work on quality enhancement of questionnaire development, survey design, translation, amongst others, is well documented and much discussed. However, when it comes to post fieldwork quality control there is much less documentation and debate.

The article “Processing, Archiving and Dissemination of ESS data. The Work of the Norwegian Social Science Data Services” (Kolsrud et.al) from 2010 gave a thorough picture of the processing procedures implemented in the early rounds of the ESS. The principles of which we base our data processing upon remain much the same today, but in the last five years we have implemented changes both in our architecture and in the software we use for data processing. Traditional processing with statistical packages at on-premises servers has been replaced with metadata driven (DDI) cloud processing. In addition, the architecture behind the data dissemination platform – the ESS Data Portal, has been renewed. We have implemented a stronger link between data and documentation. Moving to the new cloud platform has given us greater flexibility to utilise the data documentation from the various stages of the data lifecycle: from the data specifications of the survey items, the processing programs, and survey documentation. Further, having the metadata platform as a single source of truth enables publishing and updating of data in more flexible ways than before.

Transparency in the latest phase of the survey lifecycle is important to give our data users a better understanding of the possibilities and limitations in the ESS data. Further, it lays the ground for constructive feedback on questions around how data and documentation is presented to the research community. We must make sure important information allowing our users to understand and use the data correctly is available. The amount of documentation available to our users in this project is high and an important part of our task is to differentiate the information and give precedence to the most important elements. Too much information can clutter the scene and create a muddle of information hiding the important documentation in the mass of general information. Feedback from the research community in this respect is vital.

While the reasons behind the changes in our data processing, storing and dissemination infrastructure are diverse, one of the advantages we seek to obtain is to shorten the time between when the data is collected and when it is made available for the user community. In a world where the amount of data available for researchers is increasing, the contrasting demands between the need for fresh research data and the need for good data quality and documentation, is highly relevant. An open discussion on these contrasting needs is important to find the best possible balance between providing data of high quality and usability, and bringing our users fresh measures of attitudes, beliefs, and behaviour patterns of the diverse populations in the ESS participating countries.



Reproduce Me If You Can: Insights from a Meta-Scientific Assessment of Observational Social Science Studies Using ESS Data

Daniel Krähmer, Laura Schächtele

LMU Munich, Germany

For decades, the European Social Survey (ESS) has provided high-quality data to the social science community. With over 9,100 scientific contributions listed in the ESS bibliographic database as of January 2024, the ESS constitutes a prominent and popular data source among social scientists. Yet, it is unclear if analyses using these data meet meta-scientific standards of openness and reproducibility. The absence of such an assessment is not surprising. As a data provider, the ESS itself lacks the resources to monitor and enforce meta-scientific norms among its users. Entities dedicated to research transparency and reproducibility (i.e., data editors) have had no reason to single-out ESS publications. In fact, systematic assessments of transparency, reproducibility, and robustness are generally missing for observational social science research.

This gap is concerning. Spurred by the replication crisis in psychology, studies have demonstrated how social science research may be jeopardized by misuse of analytical flexibility (Simmons, Nelson, and Simonsohn 2011) and cherry-picking of results (Muñoz and Young 2018). Observational social science, in particular, is susceptible to error and misspecification, as data is typically complex and researcher degrees of freedom are abundant (Auspurg and Brüderl 2022). Even seemingly innocuous statistical procedures, such as imputations, may exert significant impact upon results (Troccoli 2023), posing challenges for data users and providers.

We report empirical insights from a large-scale meta-scientific audit among users of ESS data. From June 2022 to January 2023, we contacted 1,206 authors who had published research based on ESS data between 2015 and 2020. In our e-mail, we explained our intention of conducting reproducibility checks and asked original authors to share the statistical code needed to recreate their findings. Of all contacted authors, excluding bounces and overcoverage, 37.5% (n = 385) provided their analysis code, suggesting that openness among ESS authors is comparable to other samples (e.g., Stodden, Seiler, and Ma 2018; Vanpaemel et al. 2015; Wicherts et al. 2006). Using authors’ code and publicly available ESS data, we aimed to assess the reproducibility of results for a random sample of n = 100 articles. Of the 50 articles assessed as of January 2024, we were able to successfully reproduce 44% (n = 22) starting from the scientific use files provided by the ESS. For another 13 articles (26%), we achieved numerical identity but considered the reproduction unsuccessful as the authors only provided pre-processed data that could not be linked unequivocally to the original ESS files. Our audit highlights considerable reproducibility problems related to data management, including missing data citations and opaque datasets emanating from the ESS Data Wizard. We also find evidence of official recommendations on handling ESS data being ignored (i.e., related to weighting procedures). Taken together, our results demonstrate that ESS research can improve on meta-scientific standards.



The US General Social Survey and European Social Survey: A Methodological and Operational Comparison

Jodie Smylie, Ned English, René Bautista

NORC, United States of America

The US General Social Survey (GSS) has been conducted since 1972 in the United States and is considered the first example of flagship cross-sectional surveys to measure national attitudes and behaviors. As such it has been a model for key studies that have followed. This year, the GSS is embarking upon a methodologically driven collaboration with the European Social Survey (ESS) to leverage learnings from both studies and help advance the field of survey methodology. In recognition of this collaboration, this paper aims to explore the similarities and differences between the two research programs. Areas to be covered include (i) Sampling frame and sampling strategy, (ii) Within-household respondent selection methods, (iii) Weighting and estimation, (iv) Data harmonization, and (v) Survey operations.

Both the GSS and ESS are undergoing significant methodological and operational shifts. Like many studies across Europe, the GSS in recent years had to adapt to an environment of higher costs and lower participation rates by transitioning to multi-mode designs. A common concern around these adaptations would be any impact that design changes could have on cross sectional estimates or the understanding of trends. Consequently, the GSS and ESS have each prioritized maintaining high quality, data integrity, and comparability across time.

As part of the comparison between ESS and GSS, this paper also describes recent changes in the 2022 GSS survey methodology; namely, it compares two experimental multi-mode strategies featuring both face-to-face (f2f) and mail push to web (web push) modes of data collection. In 2022, GSS cases were randomized into two experimental groups as follows: Condition 1. Addresses were first approached in-person (which has been the standard GSS f2f protocol), with non-respondents invited to participate via web. Condition 2. Households were first invited to participate via web, with a subsample of non-respondents visited in-person using the standard GSS f2f protocol. We present the results of the GSS experiment implemented during the 2022 round and how it has influenced our design for 2024. The paper aims to serve as a helpful reference to survey designers and practitioners in cross section and cross cultural research.