15-11-2024

Unifying Knowledge. Open Data in the Context of the Humanities

Eder Ávila Barrientos
Introduction
During the last decade, open research data repositories (ORDR) have emerged as an essential component for the global research panorama, transforming the way data are shared, reused, and preserved. This trend has not only impacted “hard” sciences, but has also changed deeply the humanities realm, a field traditionally associated with qualitative methods, text analysis, socio-cultural, and historical studies.

As the process of digitalization continues developing and open data access policies are being consolidated in many countries, ORDR in humanities are acquiring an unprecedented international relevance, as the phenomenon of datafication has motivated the constant creation of new data in all different areas of human activity.

On this basis, this article explores how these repositories are facilitating transnational collaboration, promoting cultural and linguistic diversity, and addressing challenges that arise from handling sensitive and culturally important data. In addition, we will discuss how globalization and new technologies are redefining research practices in humanities, pushing an ongoing dialogue about equal access to knowledge and the preservation of cultural heritage in a global scale.

THE PHENOMENON OF DATAFICATION HAS MOTIVATED THE CONSTANT CREATION OF NEW DATA IN ALL DIFFERENT AREAS OF HUMAN ACTIVITY

Datafication in Humanities
Datafication is a global phenomenon with multiple aspects, and it has motivated a series of thoughts and considerations related to the generation, use, and exploitation of data in various fields of human activity. In the context of humanities, data play a crucial role in explaining general principles, laws, and patterns that shape culture and human knowledge to the present day.

Datafication is the process in which certain information is captured in a quantified format so that these can be tabulated and analysed (Mayer, Cukier, & Iriarte, 2013). This view of data and their linkage with facts and real events has motivated the proliferation of huge amounts of global information, which has also been influenced by the widespread use of mobile devices and digital technologies.

Big data, then, refers to a specific type of data with three main characteristics: volume, speed, and veracity (Caballero & Martín, 2015). This means that large-scale data have a large volume which cannot be processed by any conventional computer; besides, the speed at which the information is processed and shared is crucial for the discovery of new patterns and knowledge, and simultaneously as data is diverse, can vary in structure and typology.

Big data represents a confluence of trends which have been maturing over the last decade: social networks, mobility, applications, falling broadband costs, interconnection of objects through the Internet (the Internet of Things) and cloud computing.

This way, the conversation between humanities and big data is not reduced to the adoption of algorithms to quantitatively study large collections of texts and images (Rojas Castro, 2017). This interaction has also allowed the creation of various projects and developments using holistic approaches that seek to analyse data from a critical point of view.

From cultural data management to the analysis of historical texts, the interaction of humanities in the context of datafication has made possible to propose new study optics, to define methodologies, and to design strategies for a better understanding of human knowledge, all based on processing large amounts of data.

Trends such as Humanities Digital Workshop (some of its working communities can be found at: https://hdw.wustl.edu/ and https://digitalhumanities.duke.edu/doing-dh/workshop), have provided a glimpse natural language processing techniques which analyse large corpuses of historical texts. These techniques have been useful in identifying patterns, themes, and behaviours that would be difficult to detect in simple view.

Summarizing, the intersection between humanities and big data represents an emerging field of study, where tools and techniques from data science and information studies are applied to challenges and questions of the humanistic field. This union offers new perspectives and possibilities to understand the past, the present, and the future of studies in humanities.

Humanities have traditionally focused on the qualitative analysis of texts, artifacts, and historical events. However, the rise of digitalization and the development of computational tools have opened new possibilities to make quantitative analysis of large volumes of data, opening a whole range of ways in terms of identifying patterns and behaviours hidden in the data themselves.

The ORDR
The enormous amount of data available today reflects a constant flow of information manifestations and expressions generated by different media and sources. Just as a tsunami floods the shores with a tremendous force, data tsunamis inundate companies, organizations, and societies with an avalanche of information that can be difficult to manage and process. The wide diversity of data and its usefulness have shown a series of principles related to how it is generated, shared, and used in different disciplinary contexts.

For example, open research data are sets that bring together information from scientific and scholarly research, which are characterized by their latent reusability and accessibility through the Internet. “Open research data support results of scientific research and have unrestricted access, so anyone is allowed to consult them” (European Commission, n. d.).

Search and access to ORD must be developed in a systematized way. In this context, ORDR are platforms where it is possible to store, organize, search, retrieve, access, and share any research data in an open way, that is, without economic, legal, nor technical restrictions.

ORDR may be of very different themes since research is developed today with a strong multi,- trans,- and interdisciplinary character. On the Internet we can observe repositories which have been built through a holistic vision that considers their use by different types of communities.

The Registry of Research Data Repositories (Re3data) is the most complete for searching and identifying data repositories. It is a digital platform that facilitates sharing and access, and brings greater visibility to research data, fostering collaboration and innovation among the scientific and humanist community (Nahem & Mir, 2024).

The growing production and reutilization of research data in humanities has generated an also growing demand of services for discovering easy to use data in the identification of deposit services and research data repositories (Buddenbohm et al., 2021).

According to Re3data, there are currently 1,229 repositories registered in this directory which address in a general way topics related to humanities and social sciences (see figure 1). In a more specific way, in this source, repositories that offer open access to research data can be consulted, in disciplines such as history, philosophy, theology, linguistics, and many more.

Figure 1. Number of ORDR by subject matter

Source: Registro de Repositorios de Datos de Investigación, https://www.re3data.org/metrics/subjects

Figure 2 shows a hierarchy of topics related to the context of humanities in Re3data. It is possible to access open research data related to a particular topic in this source, since Re3data records the attributes of ORDR to make search and the retrieval of information easier in an integral context where all user communities can learn about the types of data collected in the repository, licenses, and dates of creation.

Figure 2. Hierarchy of topics related to the humanities of the data repositories registered in Re3data
Source: Registro de Repositorios de Datos de Investigación, https://www.re3data.org/browse/by-subject/

Research data in humanities is the most diverse of all scientific disciplines because any data on human activity can be considered research data, including newspapers, photographs, diaries, church records, court records, etc. (Poljak Bilic & Posavec, 2024). The wide typological and thematic diversity of research data in humanities has inspired their analysis, making possible to know part of their behaviour and to generate new knowledge, theories, and perspectives that enrich world understanding.

International Perspective on the Interaction between ORDR and Humanities
The linkage between ORDR and humanities, from a holistic point of view, hints at the formation of knowledge networks related to the international participation of various stakeholders which form today’s research data ecosystem.

Digitalization processes have influenced the formation of a global network of humanistic knowledge, linking researchers around the world and allowing a wider vision of culture and society. This way, researchers from different parts of the world can collaborate in real time sharing and reusing cultural, historic, and linguistic data that enriches the perspectives on common subjects and propitiate the use of data in research communities.

The interaction between data, digital technology, and humanities has led to the creation of categories such as digital humanities: a new area of study arising from the relationship between humanists and digital tools that support the development of their research (Rahman, Ahmad, & Zakaria, 2023).

Based on this premise, projects such as DARIAH (Digital Research Infrastructure for the Arts and Humanities) in Europe, or the Hathi Trust (https://www.hathitrust.org/) in the US, show how research data is being used to create large and internationally accessible data repositories, promoting a transnational dialogue in humanities.

More concrete examples, like CORA (https://dataverse.csuc.cat/) and e-CienciaDatos (https://edatos.consorciomadrono.es/) projects from Spain and Cataluña, show the broad multi-, trans- and interdisciplinary richness of research data generation. CORA, for instance, brings access to 187 datasets related to arts and humanities, 16% of which have been shared with another digital source in the cyberspace. On its side, e-CienciaDatos contains 458 datasets related to arts and humanities, 30% of which have been shared with another data source or service available in the digital environment. This way, reusing research data in humanities may be a starting point to understand the generation of new projects related to the use and management of this type of data.

Reusing research data addresses access to previously collected datasets to perform new analyses, answer additional research questions, or validate previous results. These data may have been generated by other researchers or by the same research team in a previous project.

In the humanities realm, it is essential to understand and promote the reuse of data since this contributes to knowledge advancement and to the efficient use of available resources for academic research. New initiatives have emerged to make collaboration and data sharing easier, these initiatives seek to standardize data formats and collection methodologies, such as the FAIR principle (Findable, Accessible, Interoperable, and Reusable), which in one hand, makes data easier to share and reuse internationally while establishing links to identify their origin and their connections with other sources.

On the other hand, the standardization of data in humanities presents challenges in terms of the representation of cultural and linguistic diversity. It is crucial to recognize and to respect the particularities of each culture and language to avoid homogenization of data and, therefore, the loss of unique contexts, since the correct description and representation of data are based on the properties of their context.

Furthermore, national and international policies on access to data play a central role in consultation and reuse of this data. In some countries, research data is available due to open science policies that have been implemented at the governmental level, while in others some restrictions may limit international collaboration by establishing economic, technical, and legal limitations to research data.

Because of this, it is necessary to consider that the availability of financial resources for the digitalization and preservation of data in the humanities realm varies considerably between countries and can generate disparities in the quality and quantity of data available for international research.

In this sense, the collection and use of data in humanities also raises ethical issues that must be discussed, especially when working with vulnerable communities or culturally sensitive data, as when saving information that arises from ethnic community studies where people are the main actors collecting data.

In this way, the international perspective on data management reveals ethical concerns related to human rights and intellectual property as well as with informed consent, where data generators and managers are the main entities in the data ecosystem as we know it today.

The question of who owns and who controls data is crucial, especially in international frontier research projects, where data can be made globally visible using digital technologies. This has generated a series of debates about data sovereignty, and equality in its use and access for society.

Considering newborn technologies such as artificial intelligence and machine learning, humanities have begun to explore new ways of interacting with data. Internationally, these technologies can facilitate large-scale data analysis and may cross linguistic and cultural boundaries, opening new opportunities for inter-, multi- and trans-disciplinary data-driven research.

Humanities have an undeniable role to play in addressing global challenges such as climate change, migration, and social inequity, where research data can be used to generate a deeper understanding on these issues and collaborative solutions to understanding global phenomena with a humanistic vision supported by emerging technologies.

Final Considerations
Datafication has radically our understanding of the world, including the humanistic approaches. Humanities were traditionally focused on the study of culture, thought, and human creativity from qualitative perspectives. Today, digital tools and data analytics offer new ways to explore these fields from a holistic point of view.

Datafication has allowed humanistic studies to expand into areas which were previously unexplored or difficult to analyse through quantitative methodologies. Cultural, literary, and artistic phenomena can now be analysed through the exploration of large volumes of data, opening the door to new approaches of research and discovery.

The increasing internationalization of research in humanities, pushed forward by datafication, has transformed the dynamics of knowledge in ways which a few decades ago were unthinkable. In an increasingly interconnected world, the use of research data in humanities has not only expanded geographically, but has also enabled a deep and diverse dialogue between different cultures, traditions, and languages.

The international perspective of data use in humanities offers opportunities to enrich the global understanding of culture, philosophy, history, human thought, and all those disciplinary insights of the humanistic approach. The key for maximizing these benefits is encouraging fair collaborations, methodological diversity, and cultural sensitivity in the development of future research projects.

Eder Ávila Barrientos holds a PhD in Library science and Information Studies at UNAM. He is a full-time associate researcher at the Research Institute of Library Sciences, and a tutor for the bachleor’s degree in Library Science and Information Studies, and a professor at the postgrad program in Library Sciences an Information Studies.

Some Useful Tools for Digital Humanities Research 


UNAM Internacional

Innovation process for digital technologies is quick. It could be said that new tools appear every day. Many of them can be reached through free or open access policies, and they all represent important opportunities to expand traditional research (not only for Humanities) through accessing ever growing massive datasets (big data). Generative artificial intelligence is just one of an overwhelming number of examples).

Duke University in the US keeps an intensive program in digital Humanities. In their tools page (https://digitalhumanities.duke.edu/doing-dh/dh-tools) the following are listed:

Agisoft Photoscan: “software which produces 2-D elevation models and geometrically corrected aerial photos of a landscape; 3-D models of an object by performing the photogrammetric process on many digital images.” Private license, https://www.agisoft.com/

ArcGIS: “a foundational piece for GIS professionals to create, analyse, manage, and share geographic information so decision-makers can make intelligent, informed decisions. It allows you to create maps, perform spatial analysis, and manage data.” Private license, https://desktop.arcgis.com/es/desktop/index.html

CARTO: “a leading Location Intelligence platform, enabling organizations to use spatial data.” It is basically helpful for mapping and GIS. Open source, https://carto.com/

DH Press: a toolbox which works like a plugin for WordPress to access digital public humanities. Open source, https://dh.sites.gettysburg.edu/toolkit/

DRUPAL: versatile system for content creation. Open source, https://drupal.org/

Gephi: network analysis software which also produces data visualizations. One of its applications was able to visualize the global connectivity of the New York Times. Free software, https://gephi.org/

Mapbox: geographic management platform supported by location-centric artificial intelligence. Subscription service with a free trial option, https://www.mapbox.com/

Neatline: map storytelling tool. Free software, https://www.neatline.org/

Omeka: Neatline’s parent project; a free and flexible platform for digital publishing, display, and curation of visual collections and exhibitions. It is ideal for material culture projects. Free software, https://omeka.org/

Scalar: digital publishing platform designed to create large-scope projects. It allows the organization of content from multiple media. Open source, https://scalar.me/anvc/scalar/

SketchUp: 3-D modelling design program. Free software (there are paid professional versions), http://www.sketchup.com/es

Social Feed Manager: this tool harvests X posts (formerly tweets) and organize them systematically according to topics or other criteria, as an aid to research processes.

Free software, https://social-feed-manager.readthedocs.io/en/m5_004/index.html#

Eder Ávila Barrientos holds a PhD in Library science and Information Studies at UNAM. He is a full-time associate researcher at the Research Institute of Library Sciences, and a tutor for the bachleor’s degree in Library Science and Information Studies, and a professor at the Graduate program in Library Sciences an Information Studies.

References
Buddenbohm, Stefan; De Jong, Maaike; Minel, Jean-Luc, & Moranville, Yoann. (2021). “Find Research Data Repositories for the Humanities - the Data Deposit Recommendation Service”. International Journal of Digital Humanities, 1(3). https://doi.org/10.1007/s42803-021-00030-7

Caballero, Rafael, & Martín, Enrique. (2022). Las bases de big data y de la inteligencia artificial. Madrid: Catarata. 

Comisión Europea (s. f.). “Facts and Figures for Open Research Data”. https://research-and-innovation.ec.europa.eu/strategy/strategy-2020-2024/our-digital-future/open-science/open-science-monitor/facts-and-figures-open-research-data_en

Mayer-Schonberger, Viktor; Cukier, Kenneth; Iriarte, Antonio. (2013). Big data: la revolución de los datos masivos. Madrid: Turner. 

Naheem, K. T. y Mir, Aasif Ahmad. (2024). “Analyzing research data repositories (RDR) from BRICS nations: a comprehensive study”. Library Management, 45(6/7).https://bmi.arizona.edu/sites/bmi.arizona.edu/files/BMI-The-Funnel-Effect-2006.pdf

Poljak Bilic, Ljiljana & Posavec, Kristina. (2024). “FAIRness of Research Data in the European Humanities Landscape”. MDPI Journal, 12 (1), 2024. https://www.mdpi.com/2304-6775/12/1/6

Rahman, Md. Habibur; Ahmad, Azree, & Zakaria, Sohaimi. (2023). “Digital humanities practice in university libraries of Bangladesh”. Digital Library Perspectives, 39(3). https://doi.org/10.1108/DLP-11-2022-0085

Rojas Castro, Antonio. (2017). “Big Data en las humanidades digitales: nuevas conversaciones en el contexto académico global”. Acción Cultural Española, Anuario AC/E 2017 de cultura digital. Cultura inteligente: Análisis de tendencias digitales. España. https://www.accioncultural.es/media/Default%20Files/activ/2017/ebook/anuario/4BigDataHumamidades_AntonioRojas.pdf