Skip to main content

The Wikimedia Movement in Canada: 4. Wikidata in Canada and the Mariposa Folk Festival Linked Data Project

The Wikimedia Movement in Canada
4. Wikidata in Canada and the Mariposa Folk Festival Linked Data Project
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeThe Wikimedia Movement in Canada
  • Learn more about Manifold

Notes

table of contents
  1. Cover
  2. Introduction
  3. Focus I. Identities
    1. 1. Protocols of Pluralization: Negotiating Cultural Cohabitation in Wikipedia
    2. 2. Does Wikipedia’s Acadia Portal Offer an Accurate Portrait?
    3. 3. Using Wikidata to Quantify the Gender Gap in Biographical Resources
  4. Focus II. Institutions
    1. 4. Wikidata in Canada and the Mariposa Folk Festival Linked Data Project
    2. 5. Wikimedia in a Québec Art Museum: Exploring an Open Cultural Institution Model
    3. 6. Open Government: A Wiki to Link Them All Together
  5. Focus III. Literacies
    1. 7. Public Knowledge During the COVID-19 Infodemic: Health Literacy and the Effect of Wikipedia
  6. Afterword: The Value of Verified Knowledge in the Age of Generative AI
  7. List of Contributors

Chapter 4.4Wikidata in Canada and
the Mariposa Folk Festival Linked Data Project

Stacy Allison-Cassin

Creating opportunities for data availability, reuse, and transparency and ensuring the accessibility of publicly funded data are considered essential hallmarks of an equitable civil society. Researchers, organizations, and institutions beyond government are encouraged to consider making data more available. In Canada, numerous levels of government have, for many years, been concerned with increasing the availability and awareness of the relationship between open data and an innovative, knowledgeable, engaged, and just society. The Government of Canada defines open data as “structured data that is machine-readable, freely shared, used and built on without restrictions” and cites benefits as innovation, informed decision-making by consumers, the leveraging of public sector information, and increased government accountability (Treasury Board of Canada Secretariat, n.d.). The open data definition includes several key criteria: structured, machine readable, and freely available without restrictions—elements not typical for proprietary commercial data where data is typically not shared for reuse outside the bounds of an organization.

Libraries, archives, museums, galleries, cultural centres, and other organizations related to culture, such as music and historical projects, also have mandates to impact society within Canada positively. Because many of these organizations typically have needs related to the management of information and data, particularly around the management of material culture collections, they are also highly involved in creating and maintaining structured data. In the case of libraries, archives, and sometimes other organizations, this data is machine readable and created, stored, and exchanged according to local, national, and international standards. These organizations also typically have an interest in the subjects of their organization, whether it be making collections accessible in the case of archives or increasing the impact of research in the case of a university.

Sharing, enhancing, or making available structured data is an excellent way to meet these organizational and broader societal goals. As with the case of open data from the government, sharing data openly also aids in making society more just and equitable and increases the impact of culture in Canada and beyond. A growing focus on artificial intelligence (AI) and new modes of search and discovery point to the need for large-scale datasets of highly structured data to support continued machine learning and processing innovation. Despite the benefits of creating and using structured data and making data available openly, many libraries, archives, and cultural organizations are frequently challenged by a lack of technical resources to make their data openly available (Allison-Cassin & Scott, 2018). Further, many systems used to manage collections within organizations are not available on the open web, restricting the ability to make information related to collections and culture creators visible and reusable. Cultural data is frequently not open data (Zhu et al., 2023).

My journey into Wikipedia and Wikidata began out of necessity. In 2015, I was appointed as the W. P. Scott Chair in E-Librarianship and began a research project focused on using linked data technologies to better describe music materials (Chair for Research, n.d.). The project’s focus was materials related to the Mariposa Folk Festival (Allison-Cassin et al., 2015; Proffitt, 2018). The Mariposa Folk Festival is one of the longest-running folk festivals in North America, and the festival’s archival holdings are kept at York University’s Clara Thomas Archives and Special Collections (St. Onge, 2013). My original research plan was to mobilize the information in the festival programs, such as performers, venues, event dates, and festival organizers, as linked data, with the aim of testing enriched descriptions and exposing relationships within the festival. Standard methods of library description for music do not handle nonclassical music well, and focusing on folk music in Canada seemed to be an ideal case study to test concepts related to moving beyond a focus on musical works to the network of relationships within a cultural scene (Allison-Cassin, 2012).

I quickly ran into several challenges within the project, the first being that the creation and publishing of linked data without adequate access to resources is very difficult, and my plan to integrate content from Wikipedia into my project, such as biographies of musicians, fell short when I discovered that very few of the performers and venues connected to the Mariposa Folk Festival and the revival scene in Toronto were covered in Wikipedia. I began to write Wikipedia articles on figures and venues connected to the folk revival scene in Toronto and Canada, such as the Riverboat Coffee House (“Riverboat Coffee House,” 2024). Attempting to fill these gaps led me to become more involved in supporting and using the platforms.

In this chapter, I discuss several projects, initiatives, and research involving Wikidata in Canada within the domain of libraries, archives, and other similar cultural data applications. Using these examples, I discuss opportunities and critical issues for the continued use and development of Wikidata as an openly available structured data knowledge base in Canada. As an active user and member of the Wikidata community, I will take a personal approach to this discussion, focusing on several themes and examples. For a more systematic approach to Wikidata, see Zhao (2023) and Tharani (2021) for research articles. I will begin the discussion with my beginnings within Wikidata with the Music in Canada @ 150 Project, and subsequent sections will move through groupings of kinds of activities.

Ultimately, while Wikidata can be used to support activities related to cultural data and as a mechanism for making structured data openly available, there are critical issues within the Wikidata platform and its relationship with the wider ecosystem that need to be considered when committing resources in this area. However, at the same time, there is growing recognition of the need for ethical practice within the structured data landscape to support a wide swath of concerns such as Indigenous data, privacy, and discrimination. Wikidata has provided an avenue for many libraries, archives, and other organizations to engage in projects to make information on topics related to Canada more visible, but there are critical issues such as lack of support for Indigenous data, weakness in the availability of data and information in Canada, and issues with the fundamental structure of the project itself that make it unlikely to be a dependable source for open data. However, given the role of structured data within the current and potential future AI landscape, Wikidata may be an important player in the space.

Background

Wikidata, a Wikimedia project launched in 2012, is rapidly evolving into a significant source of online, free, openly licensed structured data. As its name suggests, Wikidata is a project that creates and supports the development of openly accessible, semantically structured data that anyone can contribute to. This collaborative effort, akin to Wikipedia, has a profound impact on the information and knowledge space, underscoring the significance of each contribution. The freely available, openly licensed, multilingual platform of Wikidata offers individuals and organizations the opportunity to shape structured data. In Canada, Wikidata has found a place within the cultural data space, aligning with organizational goals relating to structured open data (Allison-Cassin & Scott, 2018).

Wikidata, launched in 2012, is “a free, collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia, Wikimedia Commons, the other wikis of the Wikimedia movement, and to anyone in the world” (“Wikidata: Introduction,” 2022). Wikidata is a knowledge base, a repository of information structured to be read and processed by machines. Structured data is information (data) that is highly organized and easily used by machine processing—for example, addresses or birth dates that are regularized across a database. Each element within the dataset is defined and used the same way. Unstructured data is much harder for machine applications to use and is difficult to reuse. Wikidata is already integrated into applications and uses beyond the Wikimedia projects. Unlike Wikipedia, Wikidata is valued and important for these alternate “beyond the projects” uses, making it unique among the Wikimedia projects.

Wikidata was initially developed to solve a technical problem on Wikipedia. Wikipedia has numerous unique language versions, and “the first goal of Wikidata is to support Wikipedia with centralized language links and infobox data, thus reducing the workload for Wikipedia editors and at the same time increasing the content’s consistency and quality” (Vrandečić, 2013, p. 90). Before the creation of Wikidata, there were no connections between Wikipedias in different languages, making it difficult, for example, to move from the article on musician and poet Leonard Cohen in French to the article on Leonard Cohen in English or German. Wikidata performs an intermediary role by acting as the connector, ensuring all the articles on Leonard Cohen, no matter the language, are linked. With Wikidata, instead “of the articles linking to each other, Wikidata maintains lists of all articles about a certain topic in the different language editions. Whenever an article is rendered, the software queries Wikidata for that list and displays it. The Wikipedias are completely relieved of maintaining these lists” (Vrandečić, 2013, p. 91). The creation of these connections highly improved the functionality of Wikipedia.

While its original role was supporting important technical structures of Wikipedia, the platform quickly began to support other kinds of structured data work. Wikipedia is text based and focused on the narrative form, making it much more usable from a human perspective; Wikidata is friendly to machine processing. Like Wikipedia, “Wikidata is also based on a community-editing model; it harnesses the distributed efforts of a worldwide community of contributors, including domain experts and bot developers. Anyone can add new statements, ranging from individual facts to large-scale data imports” (Waagmeester et al., 2020). However, including larger-scale editing of Wikidata through uploading larger datasets is tightly connected to its utility within technical projects. This utility is further amplified by its licence structure. Unlike many commercial proprietary databases, “the data in Wikidata is published under the Creative Commons CC-0 license, allowing anyone to reuse and republish the data in any way. The data is free of charge and without any conditions and requirements” (Vrandečić, 2013, p. 90). These factors make the enormous datastore attractive to consume within other applications. The size and licence are critical to its success, including within the cultural heritage and education sectors.

Wikidata and the GLAM Community

The cultural heritage and broader culture organizations, often using the shorthand GLAM (galleries, libraries, archives, and museums), have been active in Wikidata from an early period (Tharani, 2021). When Wikidata was launched, many GLAM organizations were already active in Wikipedia, Wikimedia Commons, and other Wikimedia projects, and they leveraged campaigns such as Art + Feminism as collaborative outreach initiatives (Proffitt, 2018). GLAM organizations in Canada have also been engaged in the use of Wikipedia for outreach activities. For example, the Art Gallery of Ontario hosted edit-a-thons as far back as 2015 (Art Gallery of Ontario, 2015). Because many GLAM organizations have strong connections to metadata through collection management needs and staff with high levels of knowledge and experience with metadata practices, the barrier to entry for Wikidata was lower than for other communities with less technological aptitude and resources. Factors such as responsive vocabulary development, an easy-to-use interface, and the potential to integrate Wikidata into library catalogues are cited as some of the positive aspects of Wikidata (Allison-Cassin & Scott, 2018; ARL, 2019).

Wikidata has emerged as a tool for creating linked data and enhancing access to collections within the GLAM sector (Allison-Cassin & Scott, 2018; Ansovini et al., 2022; Tharani, 2021), but questions remain regarding its viability as a reliable part of the sector’s ecosystem because of issues such as data quality, lack of cohesion in the data model, and underlying problems related to data and information availability in Canada. GLAM and other cultural sector domains within Canada use Wikidata as part of their efforts to manage and make their collections visible. Wikidata is a knowledge base used by larger machine-processing entities such as Google and the growing AI tools for enabling contextual search and connecting information regarding the location of related connections, which can aid in visibility (Gertner et al., 2023). While creating these links to authority files or other data sources serves a purpose in supporting data and information needs and processes on Wikipedia and across the Wikimedia projects, Wikidata also supports external-use cases in the GLAM sector such as authority control (Bianchini et al., 2021; van Veen, 2019), extending the impact of research and researcher profiles (Nielsen et al., 2017; Odell et al., 2022), and enriching collections information (Ansovini et al., 2022; Colla et al., 2021; Hawkins, 2022).

Wikidata plays a crucial role in connecting Canadian-focused content, thereby improving the visibility of cultural resources both within Canada and beyond. For instance, Wikidata connects the item for author, poet, and actress Pauline Johnson to her item in library catalogues and information resources, driving the creation of a Google knowledge card. Both Wikipedia and Wikidata feed Google’s knowledge graph, but the structured, machine-readable data in Wikidata makes this process smoother and also allows Wikidata editors the opportunity to more effectively enhance the connections in the knowledge graph with links to other information sources. As a metadata platform offering opportunities to shift information back and forth between collection databases, the ability to amplify collections and materials and to provide strong interlinking between data sites leading to greater impacts and enhancements make it a more effective tool for the movement of data, thereby strengthening the connection to our cultural heritage.

Wikidata and the Mariposa Folk Festival Data

As stated in the opening of this chapter, I began my work with Wikidata while working on a project related to my position as the W. P. Scott Chair in E-Librarianship. The Mariposa Folk Festival linked data project aimed to model a network of relationships among the festival performers based on a dataset developed from the programs. The process for the creation of the dataset began with the development of an ideal data model based on the categories of elements of the programs and then consideration of likely valuable attributes for each category. The initial data model included categories such as musical performers, artisans, dancers, venues, dates, and the festival administrative staff. A set of attributes was developed for each of these different categories of data; for performers, this included properties such as name, music genre, medium the artist worked in, and the date they played at the festival. Google Sheets was used to record data in a separate spreadsheet for each grouping. For example, a sheet for musicians is separate from a sheet for bands. Because this was a linked data project and a desire to enable interlinking between the Mariposa dataset and open data available on the web, sources of persistent identifiers (PIDs) were included.

Sources of PIDs included the Virtual International Authority File (VIAF), a source of linked information related to library authority data, and music-specific identifier sources such as MusicBrainz and Discogs. Through their Linked Digital Futures initiative, the Canadian Association for the Performing Arts (CAPACOA) has focused on creating structured data to support the visibility of artists, venues, and performing arts events in Canada. They have pointed to the necessity of increasing the availability of linked data and, by extension, uniform resource identifiers (URIs) as part of an urgent need for sound digital strategies for Canadian performing arts. A lack of a strong, open, stable metadata culture within Canada weakens the overall health of sectors like culture and the performing arts, as “performing arts metadata has emerged as one of the most pressing issues for the performing arts sectors in Canada” (CAPACOA, n.d.).

Along with developing the data related to the Mariposa Folk Festival, I investigated available methods to transform the spreadsheet data into linked data and make the data available on the web. Lacking accessible software, tools, and resources, including access to software developers to create linked data, I turned to Wikidata as a method to create and publish data related to the festival. The excellent integration between software tools such as Google Sheets, OpenRefine, and Wikidata made uploading data to Wikidata relatively simple. Furthermore, I could do all the work myself because Wikidata is free to use, openly available, and supported by an active community. Publishing the data using Wikidata provided a helpful solution and surfaced several issues of interest to using Wikidata in the Canadian context.

Lack of Coverage of Canadian Music

The Mariposa Folk Festival is one of Canada’s primary folk music festivals, and performing at the festival indicates recognition of notability. However, finding information to fill out the data on individual performers, bands, and other figures proved to be challenging. While there is Wikipedia, encyclopedias, and other notable sources available for well-known performers such as Gordon Lightfoot and Joni Mitchell, the same could not be said for the bulk of the performers, particularly as Mariposa moved away from more prominent mainstream performers. Biographical information in the programs was insufficient to fill in the desired data. Typical reference sources such as the Encyclopedia of Music in Canada, which was folded into the Encyclopedia of Canada in 2003, continue to be weighted toward classical music established by jazz musicians. While there has been an attempt to expand coverage to popular music, the encyclopedia has much catching up to do to cast off the prejudices of its original editors of the music-specific publication over what constituted noteworthy and important music.

National and local newspapers with a regular music review column or section can often be a good source for coverage of performers. However, searching through newspaper sources did not yield good results. For example, although the Toronto-based band the Dirty Shames appears in the Mariposa programs and photographs in the Toronto Telegram (McFadden et al., 1966), little information about the band leaves the Wikidata item bare (“The Dirty Shames,” n.d.). Some information on musicians in the Mariposa dataset was found in digital newspaper archives; however, a subscription is required for newspapers such as the Toronto Star or The Globe and Mail. The “locked down” nature of these resources means they are unavailable to the public, and the data is not open to the web. Other information was not available in digital form or could not be found. Smaller newspapers and publications can be critical for coverage of local music scenes, but the archives of these publications may not be available, and many of these publications are no longer in operation, with their websites no longer available.

Local media, such as newspapers and local newscasts, is rapidly shrinking in Canada, with CBC News reporting the closure of 70 community newspapers across Ontario in 2023. The lack of community coverage and smaller local media doesn’t just mean that local people will not be informed of issues; it means a critical loss of the Canadian cultural record and serious consequences on the ability of those working on the Wikimedia projects to document Canadian culture (CBC, 2023). At the same time, digital independent community newspapers, such as Toronto’s West End Phoenix, are reconfiguring news coverage (About West End Phoenix, n.d.).

The lack of information about Canadian music in digital form not only is a challenge when researching individual items but also makes it challenging to ensure that data is reliable and trustworthy. References for statements were a good practice but not necessarily needed to ensure inclusion in the Wikidata knowledge base in the early years of Wikidata. The data for the Mariposa Folk Festival added to Wikidata as part of the project outlined here should have taken the inclusion of references, therefore, as an essential aspect of the work. However, since this original project, including references has become essential to demonstrating notability. Provenance metadata is a crucial aspect of the Wikidata platform. It acts like a citation: “References are used to point to specific sources that back up the data provided in a statement” (“Wikidata: Introduction,” 2022). A statement suggesting Céline Dion’s birthplace is Charlemagne is supported by references to reliable sources where the information can be verified. Preferred sources, as with Wikipedia, are secondary and tertiary “high-quality” and reliable sources. Including references in Wikidata statements increases the quality of linked data for many use cases and can also assist with providing higher-quality data for AI applications, and a lack of references can have implications for the usefulness of the data (Beghaeiraveri et al., 2024).

The lack of readily available references for statements about musicians in the Mariposa Folk Festival has a compounding impact and points to a broader problem with the availability of information on music and the arts more generally. In subsequent projects I have deployed using Wikidata, such as documenting literature, built heritage, and film, the inability to provide references and provenance for Canadian data on Wikidata presents a serious challenge. As has been noted for Wikipedia, reliable and established sources are required to prove notability, and lack of documentation is a known issue (McCracken, 2018). This can make it difficult not only to write articles but also to ensure that such articles are not deleted. The connection between the need to reference statements and notability makes it harder for data related to Canadian items to be created, or they may be at risk of deletion. Broadly, the lack of documentation related to music activities in Canada makes it difficult to build comprehensive, well-referenced datasets related to music within Wikidata.

Wikidata and the Music in Canada @ 150 Project

In response to the lack of information in Wikipedia and Wikidata on Canadian music, I began the Music in Canada @ 150 Wikimedia campaign, an effort to increase content and recognition of the lack of information about Canadian music in a wider community. The campaign was developed to focus efforts across Canada on creating content on music in Canada (Allison-Cassin & Scott, 2018). As a member of the Canadian Association of Music Libraries and Documentation Centre, it made sense to tap into the music library community in Canada, as there was a ready network of individuals interested and knowledgeable about both music and information.

The effort was initially conceived to tie into activities related to the 150th anniversary of the Confederation of Canada; to align with the conference theme of the annual meeting of the Canadian Traditional Music Society, the Canadian Association of Music Libraries, Archives and Documentation Centres, and the International Association for the Study of Popular Music, Canada Branch; and to receive funding from Wikimedia and a special event fund at York University. The campaign featured an in-person workshop at the University of Toronto, two virtual workshops, and numerous distributed events over a year. We used the Wikimedia events dashboard, and the effort included 11 programs and 124 editors and featured the creation of 24 new articles and 187 article edits. Events took place at the University of Prince Edward Island, Charlottetown; Memorial University, Newfoundland; Western University, London; Hamilton Public Library, Hamilton; York University, Toronto; University of Toronto, Toronto; Laurentian University, Sudbury; University of Manitoba, Winnipeg; University of Saskatchewan, Saskatoon; and MacEwan University, Edmonton. Organizers included Dan Scott (Laurentian), Caroline Doi (University of Saskatchewan), and Monica Fazekas (Western). A logo was used throughout the campaign.

The campaign’s kick-off was a full-day workshop at the University of Toronto. Because the workshop was colocated with the annual meeting of several music-related professional associations, the workshop was attended by music librarians and music researchers. The workshop was structured as a “train the trainers” session, with instruction on editing both Wikipedia and Wikidata and discussions on how to run an edit-a-thon. It was intended to assist those holding their edit-a-thons during the year of the campaign. The event featured a panel session with local individuals such as Amy Furness; Rosamond Ivey, special collections archivist and head of library and archives at the Art Gallery of Ontario; and John Dupuis, science librarian at York University, with experience holding campaign-associated edit-a-thon events such as Art + Feminism and Ada Lovelace Day. The workshop was a critical mechanism for participants to gain skills and experience and engage in discussions on some of the previous topics on the availability of information resources on Canadian music. A lively topic of conversation was the question of “why Wikipedia” versus an institutional website or other publication, which led to discussions on open access and open licensing.

During these events in 2016 and through 2017, Wikidata was still relatively unknown in Canadian GLAM organizations, and there were few available workshop materials. Dan Scott, a librarian at Laurentian University, was instrumental in creating and leading Wikidata workshops and creating instructional materials (2017a). Building on this work and concerning the Mariposa Folk Festival project, Scott and I worked on modelling a means of capturing music festival data using Wikidata. Scott focused on the Northern Light Festival Boréal, which focused on bilingual considerations (2017b). The collaborative work on Wikidata, starting with the Music in Canada @ 150 Project, provided an early engagement with the collaborative creation of music-related data and became a basis for future collaborative efforts between Allison and Scott, including a workshop at the Ontario Library Association conference on adding libraries to the Wikimedia projects and a workshop at the Semantic Web in Libraries conference on Wikibase. The campaign also spawned the creation of the GLAM-Wiki Toronto conference held in 2019, which attracted over 100 attendees from GLAM organizations across the Toronto area.

The Music in Canada @ 150 Project was formed out of gaps I perceived while working on the Mariposa Folk Festival data project. The campaign largely succeeded in mobilizing individuals in music libraries interested in critical issues related to music information. The work within the music library communities demonstrated a means of creating opportunities for Wikidata development within an already existing community of individuals interested in finding mechanisms for supporting the visibility of music in Canada. Scott succinctly summed up the efforts:

Our central argument was that, rather than focusing on directly enhancing our own local data repository silos (for example, library catalogues, digital exhibits), libraries and archives should invest their limited resources in enriching Wikidata, a centralized data repository, to maximize the visibility of those entities and the reusability of that data in the world at large . . . and then pull that data back into our local repositories to enrich our displays and integration with the broader world of data. (Scott, 2017b)

However, assessing the long-term impacts of the campaigns on the usage of Wikipedia and Wikidata in relation to music-related content is challenging, and we have not followed up with participants. Certainly, running editing campaigns will not solve the issue of the lack of secondary and tertiary sources on Canadian music.

As with Wikipedia, Wikidata is a volunteer-organized project, and the properties available to be used are user-generated by consensus. While this can be a great advantage in providing flexibility to data creators, it does mean that data is less standardized than typical systems using ontologies such as the CIDOC Conceptual Reference Model (CRM) or standardized descriptive systems found in libraries. Wikidata has no strict hierarchical structure; for example, broader and narrower concepts are not strictly observed. An additional known problem is related to the multilingual nature of Wikidata. Language is never a one-to-one translation between languages, and concepts are complex to capture. Increasing the availability of online information on Canadian topics isn’t a problem to be solved strictly by the Wikidata community. However, it is for government, libraries, archives, and music organizations to give more significant consideration to digitization and web archiving projects.

Wikidata and Events

The Mariposa Folk Festival data project provided an opportunity to explore aspects of cultural data description that are not possible within traditional descriptive practices of libraries and archives. A major aspect of this work was the creation of Wikidata items related to events or items based on things that take place in time. Typically, materials in libraries and archives are related to collection objects—for example, books, manuscripts, or recordings. The focus on or need for a collection-based item creates limitations in how something can be described and limits access and understanding of time-based artworks like music. Wikidata opens up the possibility of describing events and providing great dimensions related to data. The ability to describe events was an appeal of using Wikidata for the Mariposa Folk Festival project.

Wikidata allows for the representation of items through time, and the creation of timeline generation tools has become a popular method of visualizing data. A number of tools have been developed to take advantage of Wikidata items with date-related data. However, the modelling of Mariposa Folk Festival events was handled in a more detailed way through the linking of participants to an event. Wikidata projects provide an effective avenue for learning the best way to model and create data in relation to specific topics and can often provide important help in understanding a topic. The Wikidata Project Cultural Venues and Culture are helpful in reviewing critical issues related to describing cultural events. For the Mariposa Folk Festival data, it became clear that creating event data manually was a painstaking effort, as each individual performer needed their own Wikidata item, a large output of time and effort. Thus, while the creation of granular data is a more inclusive representation of the festival itself, it became painstaking. And though several years have passed since the inception of the Mariposa Folk Festival data, because of the ongoing challenges of structuring it, some question whether the investment of labour outweighs its perceived positive impacts. The increasing availability of tools focused on natural language processing may reduce the need for manual data entry, but there remains a dependence on machine-readable data.

Representing Archival Collections in Canada

Creating data related to archival collections and holdings on Wikidata can aid in giving a clearer picture of collections at Canadian institutions and help make primary source documents more findable. The Mariposa Folk Festival data project focused on an archival collection held at the Clara Thomas Archives and Special Collections at York University. One of the ways to connect unique collections held by Canadian institutions and Wikidata is to use the property “archives at.” This property on Wikidata “describes the data value of a statement and can be thought of as a category of data, for example ‘color’ for the data value ‘blue.’ Properties, when paired with values, form a statement in Wikidata” (“Help: Properties,” 2023). Properties in large part define the structure of Wikidata and must be approved by a consensus community process. Wikidata is made up of statements that include a property and a value in the form of an item. The values in Wikidata can be an internal link to an item within Wikidata or a link to a unique and permanent address on the web called a URI. As of June 2024, Wikidata currently lists 140 properties related to archives and people or organizations (Wikidata Property Explorer, n.d.).

Most of the properties listed are for external identifiers, but there are a small number that are for use to describe collections. Property P485, “archives at” links the Wikidata item being described to the institutions holding the archives for that item. York University archivist Katrina Cohen-Palacios (2019) has presented and provided workshop materials on methods of adding archival holdings to Wikidata—for example, adding a link from “The Toronto Evening Telegram” newspaper Wikidata item to the Clara Thomas Archives and Special Collections. Cohen-Palacios (2019) suggested that creating information about finding aids in Wikidata can save archivists time by automating some tasks that might otherwise need to be maintained manually.

Following the examples laid out by Cohen-Palacios (2019), Ansovini et al. (2022) began an initiative to add select archival holdings information from the University of Toronto to Wikidata. The addition of archival information can help individuals (and machines) from around the world find and enhance collections and materials related to Canadian culture. Furthermore, the addition of archival holdings information links institutions into the Wikidata network, allowing for queries and data visualizations to support alternate views of a myriad of connections. Demonstrating their work on the Canadian author Margaret Atwood, Ansovini et al. (2022) found that

the simple addition of one “archives at” triple, linking the records to an item in Wikidata, allows for the kinds of connections that may have been made in the reading room, where archivists use their contextual knowledge to suggest to researchers records of related people, important dates, and notable publications. Wikidata can provide machine-actionable, community-generated access points that allow users to perform exploratory searches to visualize, search, and explore relationships to other entities. (p. 12)

The Mariposa Folk Festival data project also allowed for some experimentation in adding archival holdings information to Wikidata, and the Clara Thomas Archives and Special Collections also added archival holdings information to Wikidata in relation to the Mariposa Folk Festival. Figure 4.1, generated with a SPARQL query, makes these relationships visible in a network form. The Mariposa Folk Festival is now connected via a graph to the Clara Thomas Archives and Special Collections at York University. As an activity, adding a statement to Wikidata items relating to archival holdings is easier to accomplish than something like creating entries for festival events by being constrained in scope and not requiring a great deal of technical knowledge. While adding holdings could be impactful by creating structured data relating to archives in the wider internet, archival listings on Wikidata within Canada are uneven. Cohen-Palacios demonstrated this unevenness through a query (2019). The larger bubbles in this visualization represent between 300 and 550 holdings listed in Wikidata, with the bulk of institutions located in Ontario and Québec.

Figure 4.1.

Network representation of Wikidata properties related to the archival collection held at the Clara Thomas Archives and Special Collections, York University

Graph showing the network of Wikidata properties connected to items in the archival collection at the Clara Thomas Archives and Special Collections, York University, illustrating relationships and links between different data entities.

The creation of archival holdings entries could benefit from outreach and support to ensure greater equity in coverage; otherwise, the impact of these listings may remain relatively small. Other cultural heritage sites are also using Wikidata as a tool within their workflows. For example, the Canadian Heritage Information Network (CHIN) is using Wikidata as part of its ongoing Nomenclature project. The project has created a linked data vocabulary for use within the museum sector, and CHIN has been focused on the description of artworks at this stage of the project. While CHIN is creating alignment with CIDOC CRM as the content and schema standard, they are looking to Wikidata as a means of linking collections and making their content more visible.

Wikidata and Indigenous Data in Canada

Another aspect of the Mariposa Folk Festival data project was considering the festival’s “Native People’s Area.” The Native People’s Area began in 1968 and was a physical location within the festival grounds. It included musicians, dancers, and storytellers and was included in its own section of the program. Alanis Obomsawin, an Abenaki filmmaker and performer, curated the Native People’s Area from 1970 to 1976, and the influence of the rise in Indigenous rights movements within Canada and the United States is evident in the programming with the inclusion of groups such as the North American Indian Travelling College and activist poet Duke Redbird (Mariposa Folk Festival, 2014). The Native People’s Area was handled as an additional dataset within the production of the Mariposa Folk Festival data. As with the other areas of the project, working with the festival data and Wikidata surfaced issues that are useful in a larger discussion of Wikidata within the Canadian context.

As stated in an earlier section of this chapter, finding information about individuals named in the program was frequently a challenge. However, it was even more so with the Native People’s Area. While some individuals, such as Shingoose, were named, many others only appeared in the program under a general name, such as “Six Nations Reserve Dancers” or “Metis Group.” Research in the Mariposa Folk Festival archival holdings yielded no further information. The lack of specificity made it impossible to add these participants to Wikidata. The lack of information in relation to the Native People’s Area could be an example of a lack of care or knowledge on the part of the festival organizers, and the lack of information generally on these groups points to a lack of information within the wider landscape of Canada. While it could be appropriate to include participants with vague or unclear names in an internal dataset, it is not appropriate to add such information to a global, open-knowledge base, as the meaning is too imprecise. The use of Wikidata versus an internally controlled and hosted repository or knowledge base should be carefully considered.

Additionally, a vital area of concern when looking at Wikidata within Canada is Wikidata’s appropriateness for Indigenous data. Indigenous data can be defined as “data, information, and knowledge, in any format, that impact Indigenous Peoples, nations, and communities at the collective and individual levels; data about their resources and environments, data about them as Individuals, and data about them as collectives” (Carroll et al., 2021). This wide-ranging definition is in keeping with understandings of Indigenous peoples expressed in rights frameworks such as the United Nations Declaration on the Rights of Indigenous Peoples. Given this understanding, much of the data in Wikidata in relation to territories, environments, and Indigenous nations in the country now known as Canada is Indigenous data.

The Canadian federal government passed Bill C-15 on June 21, 2021, and the act respecting the United Nations Declaration on the Rights of Indigenous Peoples and its enactment “provides that the Government of Canada must take all measures necessary to ensure that the laws of Canada are consistent with the United Nations Declaration on the Rights of Indigenous Peoples (UNDRIP), and must prepare and implement an action plan to achieve the objectives of the Declaration” (Government of Canada, 2021). While both UNDRIP and Bill C-15 have been criticized by some Indigenous peoples, they provide a framework and means of implementing necessary measures to move to action regarding the recognition of Indigenous rights, including knowledge, cultures, data, and information. UNDRIP and Bill C-15 also provide impetus for provincial and municipal governments, along with organizations, to consider measures to align policies and practices.

The growing global movement of Indigenous data sovereignty calls for the need for recognition, governance, and systems that ensure Indigenous peoples have control over their data, including data related to territories and cultures. While Wikidata, as an open-source and open-access project, adheres to the FAIR (findability, accessibility, interoperability, reusability) principles, making for positive impacts on the open-knowledge community (Odell et al., 2022), the FAIR principles ignore “power differentials and historical contexts” (Global Indigenous Data Alliance, or GIDA) and potentially uphold practices that are extractive and continue to harm Indigenous peoples (Hudson, 2020). Unnuanced and unmitigated approaches to open access are widely acknowledged to be problematic (Anderson & Christen, 2019). As of this writing, Wikidata has no mechanisms for identifying, safeguarding, or ensuring that Indigenous data is appropriately handled, including mechanisms to prevent data from being added in an unethical manner. Ideally, the Wikidata community and the Wikimedia Foundation would work toward implementing the CARE principles for Indigenous data: collective benefit, authority to control, responsibility, and ethics (Hudson, 2020). The GIDA calls on organizations to #BeFAIRandCARE. Lacking mechanisms for the ethical care and handling of Indigenous data, it is challenging to consider the ways in which Wikidata can be freely used for Indigenous data.

The data model also presents a problem for Indigenous data and respectfully representing Indigenous peoples, territories, and cultures on Wikidata. A key example is the lack of means to adequately identify Indigenous peoples. Indigenous peoples may identify themselves in a variety of ways, and their identity can be tied to their community and nation. Wikidata currently does not have the means to develop Indigenous nations as nations; rather, many Indigenous nations have been categorized as ethnicities. Indigenous identity is not an ethnicity, and recent works by Allison-Cassin (2023) and Johnston, Julian, and Singh (2022) suggest that the structural problems with the Wikidata data model for Indigenous identity are a serious barrier to considering Wikidata as an appropriate platform for information about Indigenous peoples. A further problem related to the identification of community is the lack of connection between Indigenous nations and Wikidata items.

In his Elements of Indigenous Style, Greg Younging (2018) points a number of times to the importance of recognizing Indigenous nationhood in referring to Indigenous peoples. He suggests the term nation “has become widely accepted by Indigenous Peoples to describe separate Indigenous groups as political entities.” Furthermore, Younging (2018) states that specific Indigenous peoples identify with a formal nation, such as the Métis nation of Alberta. However, Wikidata does not connect Indigenous peoples with nations, making the use of Wikidata inappropriate or questionable—especially in relation to understandings drawn from UNDRIP and various initiatives and practices related to self-determination.

Other structural problems with Wikidata in relation to Indigenous data are caused and amplified by structural problems with Wikipedia. While these issues might be unintentional, they represent an additional barrier to adequately supporting the appropriate handling of Indigenous data, making for a particular problem within the Canadian Wikidata landscape. A key example of this problem is related to the ways geographic areas are structured within Wikidata and is particularly striking in relation to Indigenous territories. Because of the Indian Act, many First Nations people were removed from the whole of their traditional territories to a system of smaller, restricted reserves of land. These reserves are geographic areas typically occupied by members of a single First Nation. Each reserve also has its own system of governance imposed by the federal government of Canada, which is known as a band government. A problem within Wikidata is confusion between “kinds” of entities in relation to First Nations communities. For example, a First Nation may be an “instance of” a First Nations band, a geographic location, or potentially another entity. The confusion in data structure oftentimes is because of the automated creation of a Wikidata item from a Wikipedia article where, in narrative form, there is little confusion. Brown states, “Description logics are hostile to ambiguity” (Brown, 2022, p. 3), and the ambiguity around what is meant by a First Nation is a structural problem making for a serious issue within data. See Allison-Cassin (2023) for a full discussion of this issue.

For the Mariposa Folk Festival data, attempting to document an individual’s identity while recognizing the previous considerations was a challenge. While the consultation group was relatively small in CAPACOA’s report, Indigenous Artists and Wikidata: Explorations and Consultations Report (Johnston et al., 2022), the overarching concerns expressed by participants about the appropriateness of Wikidata for their personal profile data are concerning. Wikidata, despite the appearance of being open and available for everyone without bias, is structured and modelled around Western understandings of the world. While the data model can be changed and modified through community consensus, it remains that the community likely does not include many Indigenous peoples, and the scope of some of the issues is of such a large scale that a significant intervention may be required. Additionally, knowledge and understanding of different knowledge systems and the importance of understanding issues within intellectual property regimes are not widely known.

Conclusion

The Mariposa Folk Festival data project was a way to experiment with linked data network production using Wikidata. The challenges of working with Wikidata from a Canadian perspective point to problems beyond Wikidata for cultural data in Canada. For example, significant issues remain about the place of Indigenous data within Wikidata that require timely and vital intervention. The continuing closure of community newspapers and local media will harm the ability of those working on the Wikimedia platforms to document Canadian-focused topics. As the Wikidata platform grows and becomes a notable hub for structured data, people in Canada will need to strategize on ways to advocate and provide pathways for adding more data to the platform—particularly in relation to communities not well represented in mainstream data. Wikidata may never play an obvious role in people’s everyday lives. However, it plays a significant role within the open metadata movement in Canada, as evidenced by the number of different ways it is being used within Canada.

For institutions and individuals to participate more effectively, great attention must be paid to metadata and digital practices at individual organizations to improve the availability of stable identifiers, open data, and sources for referencing Wikidata items. Recognizing the need to support open data initiatives that will serve all people in Canada, several cultural agencies within the Canadian federal government have used Wikidata: “Such sharing reduces the work involved in many facets of digital collection management by drawing on expertise and updates provided by other teams with established authority” (Government of Canada, 2023).

It is worth remembering that “as the largest repository of knowledge ever collected, Wikipedia remains both an astounding human achievement as well as endless opportunity for improvement, both in content and in community” (McDowell & Vetter, 2021). Wikidata will likely continue to grow within Canada by building and supporting the community. Canadian digital humanities researcher Susan Brown suggests, “Community building and cross-sectoral collaboration may be most important to address diversity effectively, given the substantial resources and infrastructure required to work with LOD” (Brown, 2022). A strong and diverse future for Wikidata in Canada will strengthen Canada’s digital presence.

References

  1. About West End Phoenix. (n.d.). West End Phoenix. Retrieved July 15, 2024, from https://www.westendphoenix.com/about-us
  2. Allison-Cassin, S. (2012). The possibility of the infinite library: Exploring the conceptual boundaries of works and texts of bibliographic description. Journal of Library Metadata, 12(2–3), 294–309. https://doi.org/10.1080/19386389.2012.700606
  3. Allison-Cassin, S. (2023). Métis nationhood: An examination of Indigenous nationhood, sovereignty and linked data through a Wikidata case study. In K. Burlingame, A. Provo, & B. M. Watson (Eds.), Ethics in linked data. Litwin Press. https://litwinbooks.com/books/ethics-in-linked-data/
  4. Allison-Cassin, S., Ruest, N., St. Onge, A., & Suhonos, M. (2015). Sounding it out: The Mariposa Folk Festival and a linked open data digital library. Digital Humanities, Sydney, Australia. https://dh-abstracts.library.virginia.edu/works/2357
  5. Allison-Cassin, S., & Scott, D. (2018). Wikidata: A platform for your library’s linked open data. code{4}Lib Journal, 40. https://journal.code4lib.org/articles/13424
  6. Ansovini, D., Babcock, K., Franco, T., Jung, J. A., Suurtamm, K., & Wong, A. (2022). Knowledge lost, knowledge gained: The implications of migrating to online archival descriptive systems. KULA: Knowledge Creation, Dissemination, and Preservation Studies, 6(3), Article 3. https://doi.org/10.18357/kula.234
  7. ARL Task Force on Wikimedia and Linked Open. (2019). ARL white paper on Wikidata: Opportunities and recommendations [Report]. Association of Research Libraries. https://apo.org.au/node/254221
  8. Art Gallery of Ontario. (2015). Art+feminism Wikipedia edit-a-thon at AGO, Toronto 15. In Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Art%2BFeminism_Wikipedia_Edit-A-Thon_at_AGO,_Toronto_15.jpg
  9. Beghaeiraveri, S. A. H., Gray, A. J. G., & McNeill, F. (2024). RQSS: Referencing quality scoring system for Wikidata. Semantic Web Journal. https://www.semantic-web-journal.net/content/rqss-referencing-quality-scoring-system-wikidata-1
  10. Bianchini, C., Bargioni, S., & Pellizzari di San Girolamo, C. C. (2021). Beyond VIAF: Wikidata as a complementary tool for authority control in libraries. Information Technology and Libraries, 40(2), Article 2. https://doi.org/10.6017/ital.v40i2.12959
  11. Brown, S. (2022). Same difference: Identity and diversity in linked open cultural data. International Journal of Humanities and Arts Computing, 16(1), 1–16. https://doi.org/10.3366/ijhac.2022.0273
  12. Canadian Arts Presenting Association. (n.d.). Linked digital future. LDFI. Retrieved May 21, 2021, from https://linkeddigitalfuture.ca/
  13. CAPACOA. (n.d.). Linked digital future. LDFI. Retrieved March 27, 2023, from https://linkeddigitalfuture.ca/
  14. Chair for research in e-librarianship 2015–2017. (n.d.). York University Libraries. Retrieved July 1, 2024, from https://www.library.yorku.ca/web/about-us/wpscott-chair-e-librarianship/chair-for-research-in-e-librarianship-2015-2017/
  15. Cohen-Palacios, K. (2019, October 24). Wikidata and archivists. Archives Association of Ontario (AAO) Institutional Issues Forum, Toronto. https://yorkspace.library.yorku.ca/xmlui/handle/10315/36898
  16. Colla, D., Goy, A., Leontino, M., & Magro, D. (2021). Wikidata support in the creation of rich semantic metadata for historical archives. Applied Sciences, 11(10), Article 10. https://doi.org/10.3390/app11104378
  17. The Dirty Shames. (n.d.). In Wikidata. Retrieved July 15, 2024, from https://www.wikidata.org/wiki/Q84939500
  18. Gertner, J., Hurst, A., Lozano, M., Woo, J., Ramirez, D., & Vancura, A. (2023, September 10). The Sunday read: “Wikipedia’s moment of truth.” New York Times. https://www.nytimes.com/2023/09/10/podcasts/the-daily/wikipedia-ai.html
  19. Government of Canada. (2023). Linked open data overview. https://www.canada.ca/en/services/culture/history-heritage/museology-conservation/collections-management/linked-open-data/linked-open-data-overview.html
  20. Government of Canada, Department of Justice. (2021, April 12). Backgrounder: United Nations declaration on the Rights of Indigenous Peoples Act. https://www.justice.gc.ca/eng/declaration/about-apropos.html
  21. Hawkins, A. (2022). Archives, linked data and the digital humanities: Increasing access to digitised and born-digital archives via the semantic web. Archival Science, 22(3), 319–44. https://doi.org/10.1007/s10502-021-09381-0
  22. Help: Properties. (2023, January 22). In Wikidata. https://www.wikidata.org/w/index.php?title=Help:Properties&oldid=1817117774
  23. Help: Sources. (2023, March 6). In Wikidata. https://www.wikidata.org/w/index.php?title=Help:Sources&oldid=1846065968
  24. Hudson, M. (2020). Indigenous data sovereignty: Towards an equitable and inclusive digital future. A Digital New Deal: Visions of Justice in a Post-Covid World. https://projects.itforchange.net/digital-new-deal/2020/11/01/indigenous-data-sovereignty-towards-an-equitable-and-inclusive-digital-future/
  25. Johnston, B., Julien, F., & Singh, A. (2022). Indigenous artists and Wikidata: Explorations and consultations report. CAPACOA. https://linkeddigitalfuture.ca/research/indigenous-artists-and-wikidata/
  26. McCracken, K. (2018, September 6). Doing the work: Editing Wikipedia as an act of reconciliation. Medium. https://medium.com/on-archivy/doing-the-work-editing-wikipedia-d82e927adb9f
  27. McDowell, Z. J., & Vetter, M. A. (2021). Wikipedia and the representation of reality. Routledge.
  28. McFadden, F., Raymond, J., & Toronto Telegram. (1966). The Dirty Shames, in performance at The Riverboat [Carol Robinson and Roy Michaels sharing one microphone in foreground, Chick Roberts and Amos Garrett sharing another microphone in background]. York University Digital Library. https://digital.library.yorku.ca/node/1078262
  29. Music in Canada @ 150: A Wikipedia and Wikidata project. (2019, March 14). In Wikimedia. https://meta.wikimedia.org/w/index.php?title=Grants:Project/Smallison/Music_in_Canada_@_150:_A_Wikipedia_and_Wikidata_Project&oldid=18933864
  30. Nielsen, F. Å., Mietchen, D., & Willighagen, E. (2017). Scholia, scientometrics and Wikidata. In E. Blomqvist, K. Hose, H. Paulheim, A. Ławrynowicz, F. Ciravegna, & O. Hartig (Eds.), The Semantic Web: ESWC 2017 satellite events (pp. 237–59). Springer International Publishing. https://doi.org/10.1007/978-3-319-70407-4_36
  31. Odell, J., Lemus-Rojas, M., & Brys, L. (2022). Selected tools for using Wikidata. https://doi.org/10.7912/5mtv-h307
  32. Proffitt, M. (2018). Leveraging Wikipedia: Connecting communities of knowledge (1st ed.). American Library Association.
  33. Riverboat Coffee House. (2024, April 23). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Riverboat_Coffee_House&oldid=1220389506
  34. Scott, D. (2017, May 28). Wikidata workshop for librarians. Coffee|Code: Dan Scott’s Blog. https://coffeecode.net/wikidata-workshop-for-librarians.html
  35. Scott, D. (2017, June 2). Wikidata, Canada 150, and music festival data. Coffee|Code: Dan Scott’s Blog. https://coffeecode.net/wikidata-canada-150-and-music-festival-data.html
  36. St. Onge, A. (2013). Mariposa folk foundation fonds. York University Libraries Clara Thomas Archives & Special Collections. https://atom.library.yorku.ca/index.php/mariposa-folk-foundation-fonds
  37. Tharani, K. (2021). Much more than a mere technology: A systematic review of Wikidata in libraries. Journal of Academic Librarianship, 47(2), 102326. https://doi.org/10.1016/j.acalib.2021.102326
  38. Treasury Board of Canada Secretariat. (n.d.). Open data 101. Retrieved March 19, 2023, from http://open.canada.ca/en/open-data-principles
  39. van Veen, T. (2019). Wikidata: From “an” identifier to “the” identifier. Information Technology and Libraries, 38(2), Article 2. https://doi.org/10.6017/ital.v38i2.10886
  40. Vrandečić, D. (2013). The rise of Wikidata. IEEE Intelligent Systems, 28(4), 90–95. https://doi.org/10.1109/MIS.2013.119
  41. Waagmeester, A. et al. (2020). Wikidata as a knowledge graph for the life sciences. eLife, 9, e52614. https://doi.org/10.7554/eLife.52614
  42. Wikidata Property Explorer. (n.d.). Retrieved March 26, 2023, from https://prop-explorer.toolforge.org/
  43. Wikidata: Introduction. (2022, July 10). In Wikidata. https://www.wikidata.org/w/index.php?title=Wikidata:Introduction&oldid=1674555622
  44. Younging, G. (2018). Elements of Indigenous style: A guide for writing by and about Indigenous peoples. Brush Education.
  45. Zhao, F. (2023). A systematic review of Wikidata in digital humanities projects. Digital Scholarship in the Humanities, 38(2), 852–74. https://doi.org/10.1093/llc/fqac083
  46. Zhu, L., Xu, A., Deng, S., Heng, G., & Li, X. (2023). Entity management using Wikidata for cultural heritage information. Cataloging & Classification Quarterly, 61(1), 20–46. https://doi.org/10.1080/01639374.2023.2188338

Annotate

Next Chapter
5. Wikimedia in a Québec Art Museum
PreviousNext
This work is licensed under a Creative Commons License (CC BY-NC-ND 4.0). It may be reproduced for non-commercial purposes, provided that the original author is credited.
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org