Unit 3 - Dealing with (empirical) research data
Data research
How
do I find reliable data?
The information presented here comes from the "Data Literacy Jena" certificate program. There you can deal with these questions in a more intensive and structured way if required.
The research cycle
The aim of the research is to find data that can be used for your own research question (i.e. secondary data). First, we would like to introduce you to the research cycle. By clicking on the corresponding buttons with the exclamation marks, you will receive further information on the respective step.
The content of the following interactive exercise is largely based on material from the Data EDUcation project at the UDE of the University of Duisburg.
Identifying data sources
Selecting suitable data sources is a crucial step in data research. But how do you actually find suitable data sources? We have summarized various research approaches in the following brief overview.Repositories seem to play an important role when it comes to data. But what exactly are repositories?
In the field of research, a repository is a long-term storage location for digital data where potential users can access the data (usually via a web portal). These can have a different thematic focus (e.g. geodata, historical documents, genetic data) or relate to data from specific institutions (e.g. Digital Library of Thuringia).
But there are also web portals outside the scientific community that provide access to datasets for a fee or free of charge. These can be provided by official bodies, companies or market and opinion research institutes, for example. As part of the open data strategy, data from the administration and publicly funded research projects should also be made more accessible, provided that they do not have to be protected (e.g. personal data).
Assessing data sources
But how do I know whether the data I have found or the data source is usable and trustworthy?
Some data sources are easier to assess in terms of their trustworthiness than others. For example, data from public institutions such as state statistical offices or cities is generally reliable. It can be more difficult with other data providers, although there are also established providers in the private sector that are often used as a source of trustworthy data (e.g. Statista). It becomes particularly difficult when the exact origin of the data is unclear (e.g. unknown sources, data in social media posts) or the source itself has a specific interest in the statements in the data (e.g. lobby groups).
The usability of data can be assessed from different perspectives. Aspects of data quality are often used (you can find more information on this in the "Data Literacy Jena" certificate program). However, since this already requires a deeper examination of the data and you may have identified different data sources, it is practical to be able to make an initial selection based on a few criteria. The so-called CRAP test (sometimes also CRAAP) can be used for this. This was originally used for the evaluation of information sources. However, the criteria applied can also be used in connection with data sources and partly overlap with the requirements for data quality. Let's take a look at the elements of the CRAP assessment and the associated questions.
C - Currency
When was the data generated and to which period do they refer?
Does the time period shown match my question?
Is more recent data required for my question?
Has the data been updated?
R -
Reliability/ Relevance
Is the data representative?
Does it contradict other available data?
Is the data suitable for answering my question?
Does the data meet the requirements for my research question (e.g. in terms of spatial resolution, scope, etc.)?
A - Authority/ Accuracy
Who is/are the author(s) or data collector(s)?
Are these persons authorized to collect the relevant data correctly?
Are there control mechanisms for checking the data (e.g. publication in a scientific journal/scientific repository)?
For what purpose was the data collected? Could the purposes of the data use influence its trustworthiness?Is there evidence that the data could be biased/influenced? (e.g. promotion by political or economic entities)
Is the data transferable from the original purpose to the new research question?
Further aspects of questioning the quality of found or self-collected data will be addressed in the certificate program Data Literacy Jena.