Chapter 3. Using Wikidata to Quantify the Gender Gap in Biographical Resources
Marie D. Martel and Simon Villeneuve
As a source of information, Wikipedia belongs to the category of encyclopedic reference works. Encyclopedism aims to “bring together as much knowledge as possible, and connect it, transmit it, share it and submit it for discussion” (Melançon, 2018). The Wikipedia encyclopedia operates under a free licence. Since 2001, it has helped set up the Wikimedia project and movement, which in turn have spawned an ecosystem of free online knowledge. Today, this content makes it possible to share, quantify, and discuss information that was previously unpublished, confidential, or disparate.
Encyclopedias are shaped by their cultural roots, ideologies, and values, as well as by the representation of reality and the knowledge they mobilize or legitimize (McDowell & Vetter, 2021; Rey, 2007). In this respect, Wikipedia is known for perpetuating sexist biases in terms of both editorial participation and biographical coverage, like so many other encyclopedias, and Wikipedia’s biases have not failed to attract media attention (Cohen, 2011; Nadeau, 2017).
Writing in The New York Times in 2011, Noam Cohen did not articulate a definition; instead, he suggested grounding the concept of the “gender gap” in the case of Wikipedia, which tends to discourage female contributors. More specifically, the gender gap, in this context, refers to the disparities in participation, content, and representation between genders on the various Wikimedia platforms. Meta-Wiki serves as a discussion platform for the various Wikimedia projects and describes two types of gender gap: “(a) a content gender gap (meaning that more men than women are covered in the main space content of our wikis), and (b) a participation gender gap, meaning that more men participate in the peer production communities of Wikimedia” (“Gender Gap,” 2025).
The number of female editors on Wikipedia across the different language versions is currently estimated at 13% on average, while the number of editors of genders other than male and female (cisgender) is estimated at 4% according to the Community Insights 2023 Report (Wikimedia Foundation, 2023). It also appears that between 2019 and 2022, the proportion of active editors of genders other than male and female (cisgender) doubled. Meanwhile, the proportions of male and female editors have remained broadly the same over this period, although there has been a significant increase in the number of female editors since 2019. According to several studies focusing on this issue, the gender gap in participation is reflected in biographical articles and their content.
In this chapter, we examine the gender gap as it affects content. We aim to present a method for documenting descriptions of gender from a sample of 80 biographical resources that are strongly linked on Wikidata, the knowledge base that powers Wikipedia. This exercise will allow us to produce new data on the percentage of biographies devoted to women in these resources, including notable Canadian reference works, and in Wikimedia projects such as the French-language and English-language versions of Wikipedia and Wikidata. Analysis of this data will bring to light some evidence-based findings about biographical coverage in these resources while enabling us to update the interpretation of the gender gap in reference works. This quantification of biographical articles is likely to contribute to a better understanding of the gender gap and a greater awareness of gender inequality and systemic sexism in the documentary world in particular and society in general.
The Question of Gender Diversity
Concerns were raised early on within the Wikipedia community about social diversity in the Wikipedia project, which manifests itself in an imbalance in the coverage of specific topics. Participants in the WikiProject Countering Systemic Bias, launched in 2004, explicitly addressed the ways in which the interests and sociodemographic profile of editors could influence content and lead to bias. Notably, the introductory text written at the beginning of the project remains practically unchanged to this day:
The Wikipedia project contains several types of WP:NPOV violations that arise from systemic bias in the demographics of the editor community. Encyclopedic coverage is imbalanced and often omits points of view from under-represented demographic groups. Systemic bias on Wikipedia may take the form of gender, geographical, racial, ideological, and other forms of bias. (“Wikipedia: WikiProject Countering Systemic Bias,” 2025)
One of the subgroups of the WikiProject Countering Systemic Bias, the Gender Gap Task Force, has been active since 2013, whereas the WikiProject Gender Studies subgroup has been active since 2005 and the WikiProject Women in Red since 2015. Through various forms of engagement, these projects aim to eliminate systemic gender bias. At least two of these projects have corresponding groups in the French-language version of Wikipedia.1
In an article published in 2020, Pierre-Yves Beaudouin, then administrator of Wikimédia France, reported that the French-language version of Wikipedia had 591,491 biographies, 18% of which were devoted to women (Beaudouin, 2020; see figure 3.1, which reproduces the one in the text). He pointed out that information on several dictionaries and encyclopedias whose contents were harmonized with Wikipedia and Wikidata (the database that feeds Wikipedia) makes it possible to present statistics on the proportion of biographies dedicated to women. Some examples show that Wikipedia’s performance in terms of gender bias compares favourably in some cases with that of other biographical resources, but the author explained that all these resources, including Wikipedia, are shaped by the inequalities that exist in society. He maintained that one of the strengths of Wikipedia, and of Wikidata in particular, lies in the technological opportunity to quantify the biographies of several resources by producing statistics on gender gaps in biographical content.
The figures for the English-language version of Wikipedia are comparable, although slightly higher: In June 2024, 398,298 biographies were devoted to women, representing 19.8% of the 2,006,964 biographies in this encyclopedia at that time on Humaniki, which calculates the percentage of genders among all the biographies in the various projects (Humaniki, n.d.).
Figure 3.1.
Statistics on the proportion of biographies dedicated to women in different resources
Source: Beaudouin, P.-Y. (2020, March). Wikimédia France. https://www.wikimedia.fr/biais-de-genre-wikipedia-aussi-imparfaite-que-la-societe/. CC BY-SA.
Based on a sample of 80 resources, dictionaries, and encyclopedias, including Canadian works, we wish to show how Wikipedia and Wikidata can enrich our understanding of social issues such as gender inequality, despite the limitations of these knowledge resources in terms of gender bias. This quantitative exploration will be conducted by investigating the following questions:
- 1. How could Wikidata be used to quantify the content of biographical sources such as dictionaries and encyclopedias?
- 2. What new data can be produced on the proportion of biographies devoted to women within Wikipedia and other biographical resources, thanks to Wikidata?
- 3. What interpretations and comparisons can be made of these biographical resources by considering, more specifically, data relating to Canadian publications, and what does this quantification tell us about the gender gap in these reference works?
Our research follows on from work on Wikipedia content and focuses specifically on the content gap related to gender asymmetry in biographical coverage of reference works. To our knowledge, no rigorous study on the comparative evaluation of biographical coverage of reference works has been undertaken since the work of Reagle and Rhue (2011). Furthermore, we propose a method that differs from the approach used by Reagle and Rhue, which relies primarily on the checklist method.2 This leads researchers to compare the coverage in absolute terms of the different biographical sources—that is, the ratio and the respective total number of biographies of men and women. It is also used to compare the behaviour of Wikipedia and the Encyclopædia Britannica in relation to various preestablished lists of notable people.3
Our approach, following recent research (Klein & Konieczny, 2015; Konieczny & Klein, 2018), builds on Wikidata’s ability to measure the gender gap in biographical sources that are harmonized with it. As we describe in the next section, this method allows us to assess the gender gap in a more extensive set of encyclopedias and dictionaries than Reagle and Rhue did. These works come from different countries, including Canada. This method quantifies the gap relatively more precisely by comparing these biographical sources with one another. Finally, we would like to emphasize that there is a scarcity of research in the French language in this sector, to which we are contributing here.
A Methodology Based on Wikidata
How can Wikidata be used to quantify the content of biographical sources (dictionaries or encyclopedias)?
The proposed methodology provides an initial response to the questions raised in this research. To carry out this research, we focused on the gender associated with the subjects of biographical publications in dictionaries and encyclopedias with unique identifiers (UID)4 that are attached to specific properties in the Wikidata free knowledge base.5 As the content of Wikidata is published under Creative Commons Zero (CC0), it is easy to browse, sort, reuse, modify, and republish based on specific criteria using the SPARQL query language in particular.6
In June 2024, an initial query enabled us to identify 465 distinct properties linked to the UIDs of publications containing biographical articles, based on their classification using the Wikidata property element relating to encyclopedias (Q55452870) or its subcategories.7 First we eliminated obvious false positives, duplicates, and gender-focused publications, then we selected publications with a minimum sample size of 1,000 entries, with a view to ensuring worldwide geographical coverage. This quantitative criterion seems to us to represent a certain threshold of documentary impact on the web as well as a statistically significant sample.
A second SPARQL query on Wikidata was applied to measure the biographical gender gap. Our final selection therefore includes 80 publications from all continents, for which we have carried out a quantitative evaluation of the gender of the biographies.
It should be noted that for the majority of the resources concerned, the sample associated with Wikidata that brings them together does not allow us to quantify the biographical gender gap accurately. However, monitoring the evolution of around 50 cases between March 2019 and April 2021 reveals a strong trend with no significant change (“Wikipédia: RAW/2019 03 01,” 2025; “Wikipédia: RAW/2021 04 01,” 2025). Lastly, we assume that the samples analyzed were not associated specifically based on a gender criterion.
The total number of biographical articles in each publication generally corresponds to the number given on the publication’s official website. It should be added that while this information is usually available for publications specializing in biographies, this is not the case for general publications. For most of these, data were deduced by exploring the internal search engine of the sites concerned. Some other data were provided by people representing the publications after they were contacted by email.
The uncertainty interval takes into account only the associated sample and represents a maximum value with a confidence level of 95%. Thus, for samples corresponding to 100% of the content of a given publication, there is no uncertainty in the percentages. The results are presented in the following section.
The full content of the three wikis studied is available for analysis using other SPARQL queries and the Humaniki site. The figures given are accurate, but they are still provisional given that new biographical content is added every hour. As a result, the figures given here for the French- and English-language versions of Wikipedia are taken from the content from June 2024, while those for Wikidata come from the Qlever site using a version (or dump)8 of Wikidata undertaken on June 6, 2024.9
More Precise Data on Women’s Biographies
What new data can be produced using Wikidata on the proportion of biographies devoted to women within Wikipedia and other biographical resources?
The empirical objective of our study is to evaluate the biographical gender gap by simultaneously considering the number of articles associated with individuals identified as female (F) and male (M) in the sample of 80 biographical resources, encyclopedias, and dictionaries selected. Table 3.1 presents the data for these resources, which total more than two million biographical articles (2,014,479 articles).
These initial results present statistics on the distribution of genders in these various biographical resources, highlighting the percentage of articles associated with individuals identified as F and M. On average, the sample contains 14% of biographical articles on F personalities and 85% on M personalities.
The two largest biographical resources, Deutsche Biographie (622,800 entries) and China Biographical Database Project (417,931 entries), bring together 1,040,732 articles, or close to 50% of the articles in the sample. However, the gender divide is relatively less pronounced in the case of these resources when compared to other resources and especially in the case of Deutsche Biographie, which presents 20.8% F biographies. If we exclude these two publications in order to have a more representative idea of all the publications analyzed, we obtain an even shallower gender divide, with 11% of biographies of F personalities and 88% of biographies of M personalities.
What interpretations and comparisons can be made of these different biographical resources by looking more specifically at data relating to Canadian publications, and what does this quantification tell us about the gender gap in reference works?
First, a few remarks about the data presented in the full table of reference works in the sample, which can be consulted on the online platform that accompanies the book. The encyclopedic data on biographies in Wikidata varies from one resource to another, as well as according to a development history ranging from manual data entry to the batch import of databases that are becoming increasingly widespread. For the Canadian Encyclopedia (CE) and the Dictionary of Canadian Biography (DCB), we conducted an initial content analysis in autumn 2018 (Villeneuve, 2025). The figures have changed little since then, as the Wikidata-associated content of these publications has grown only slightly. The observed gender gap has thus varied by around 1% for both of these publications.
It should also be noted that we have been analyzing the biographical content of the Encyclopædia Britannica (EB) and the Encyclopædia Universalis (EU) associated with Wikidata since December 2017 (Villeneuve, 2025). The proportions given for these two publications are exact to the nearest percentage point. Note that the number of elements associated with EB identifiers is much greater than the number of biographies revealed by the people representing this publication. This is explained by the association on Wikidata of the EB URL redirections.10
Some databases show a greater disparity of genders, such as the Biographical Directory of the United States Congress, with only 3% F biographies, compared with 97% M biographies. This is also the case for the Dizionario biografico degli Italiani, which has a low proportion of F biographies, equivalent to 4%. Others offer a slightly more balanced representation, such as the Dictionary of New Zealand Biography, with 27% F biographies and 73% M biographies. Another example is the American National Biography, which has 16% articles on F personalities and 83% on M personalities, which is above the average for F biographies contained in the resources in the sample.
More than half of these publications (44 out of 80, or 55%) contain 10% or less F biographies. The Dictionary of Canadian Biography, with 6%, falls into this category. Slightly more than a third of biographical resources (25 out of 80) contain between 11% and 19.9% F biographies.
No resource has 50% F biographical articles, and only 10 have 20% or more (see table 3.1). BabelNet, TDV İslam Ansiklopedisi, and the Dictionary of New Zealand Biography have 45%, 37%, and 27% F biographies, respectively; these resources show biographical coverage that appears relatively the least unbalanced of the whole. The second Canadian resource in the sample, the CE, falls into this category, with 21% F biographies.
What of the three traditional biographical resources commonly appearing in lists of French and English institutional reference works? These resources are the Encyclopædia Britannica, the Encyclopædia Universalis, and the Encyclopédie Larousse. The Encyclopædia Britannica (45,296 associated items) shows an underrepresentation of entries on F personalities (11%) compared with M personalities (89%). The Encyclopædia Universalis (17,589 items) also shows a strong gender disparity, with a low proportion of articles on F personalities (9%) and 91% on M personalities. The Encyclopédie Larousse (3,164 items) shows a similar proportion to the Encyclopædia Universalis, with 9% of biographical articles on F personalities and 91% on M personalities, highlighting a significant gender imbalance. More generally, these three publications have below-average results (14%) in terms of the percentage of F biographies.
F biography | B biography | |
|---|---|---|
Biographical resources (total) | 14.4% | 85.2% |
BabelNet | 44.7% | 55.1% |
Deutsche Biographie | 20.8% | 79.3% |
Dictionary of New Zealand Biography | 26.7% | 73.0% |
Encyclopaedia Itaù Cultural | 22.6% | 75.9% |
Encyclopedia of Fantasy | 21.2% | 78.0% |
Canadian Encyclopedia | 21.4% | 78.5% |
La Documentation permanente | 25.1% | 73.8% |
Notable Name Database | 20.1% | 79.8% |
SIKART | 19.9% | 73.2% |
TDV İslam Ansiklopedisi | 36.7% | 63.2% |
Note: In this case, we have rounded off SIKART’s data, which actually indicates 19.9% biographical F entries, to include it in the list of those 10 resources holding more than 20% biographical F articles. | ||
Furthermore, the percentage indicating the gender gap in the two subclasses of the sample corresponding to French- and English-language biographical resources is relatively similar: 14% of biographies of F personalities for the former and 13% for the latter.
In the encyclopedic works in the sample, the three Wikimedia Foundation-hosted wikis analyzed are at the top of the average for the percentage of F biographical articles (14%). Wikidata comes out on top of the three with 21% F biographical content compared with Wikipedia in French and English (20%); this is lower, however, than BabelNet (45%), TDV İslam Ansiklopedisi (37%), and the Dictionary of New Zealand Biography (27%). It should also be noted that in absolute numbers, the biographical coverage of the French-language version of Wikipedia (712,906) is far greater than the coverage of each of the French biographical resources and even the most important of them—namely, Who’s Who in France (24,513); likewise, the English-language version of Wikipedia (2,006,964) is far greater than the coverage of English resources, far exceeding Britannica, which has 45,296 entries.
Finally, we note that the average percentage of F biographies in the publications analyzed (14%) is close to the percentage of female participants in Wikimedia projects, which fluctuates between 11% and 15% (see table 3.2). In other words, it seems that the biographical gender gap in the publications analyzed is of the same order of magnitude as the gender gap for male and female contributors to Wikimedia projects.
Some Food for Thought and Discussion
Much research on gender bias has focused on participation, some research has focused on content, and some has been undertaken to focus on both. In our study, we explored the gender gap in the content of a set of encyclopedic biographical resources. Unlike previous work on the biographical coverage of these dictionaries and encyclopedias, the gender gap was quantified using Wikidata, the online database that supports Wikipedia. Thanks to this method based on massive data, we were able to update the gender situation of 80 biographical resources, totalling more than two million articles, in a rigorous, efficient, and robust way. These are results that no previous method had been able to produce. Thus, as Reagle and Rhue (2011) had already acknowledged, until now, there had been no large-scale comparisons of biographical coverage and gender in reference works, including Wikipedia, in 2011. However, this statement remains valid, and this is precisely the gap that our research aims to fill. It should be noted that the work of Reagle and Rhue did, however, compare Wikipedia and Britannica with six other biographical resources covering a few thousand articles.
F biography | M biography | Other genders | |
|---|---|---|---|
Biographical resources (total) | 12.6% | 86.4% | Not available |
en.wikipedia | 19.8% | 80.0% | 0.13% |
fr.wikipedia | 20.0% | 79.9% | 0.143% |
Wikidata | 20.9% | 60.3% | 0.09% |
From a methodological point of view, the selection criterion we used for choosing the biographical resources—a minimum threshold of 1,000 entries—helped significantly reduce the sample to resources with the greatest documentary impact. It also ensured greater precision in the results, since the alignment of these resources with Wikidata was very pronounced in most cases. The number of biographical resources and articles involved in the sample thus made it possible to detect and better establish a trend that would be difficult to deny from now on regarding the gender gap and the presence of systemic gender bias at work in encyclopedic-type reference works. The main quantitative results that have highlighted these social gender inequalities in the field of encyclopedic knowledge fall into four categories:
- 1. The gender gap in encyclopedic reference works. Biographical coverage of gender shows a considerable underrepresentation of F personalities worldwide, with an average of 14% of biographical articles concerning female personalities. In addition, more than half of the resources have 10% or fewer F biographies. Finally, we note that almost 88% (87.5%) of encyclopedic resources have fewer than 20% F biographies.
- 2. The gender gap in English- and French-language encyclopedic reference works. On the whole, English- and French-language encyclopedic resources are comparable to the average. Among these resources, those that are most generally accessible—Encyclopædia Britannica, Encyclopædia Universalis, and Larousse—are below average, and two of them are below the 10% mark.
- 3. The gender gap in Canadian encyclopedic reference works. Among the 10 most favourably positioned resources, three are in English, one is in French alone, and three others are multilingual or bilingual, such as the Canadian Encyclopedia. The latter stands out from the rest of the sample, with 21.4% F biographies. The Dictionary of Canadian Biography, which is also bilingual, is in the category of publications with the widest gender gap, with 6.1% F biographies.
- 4. The Wikimedian gender gap. The comparison between the biographical coverage of these various resources and the three Wikimedia Foundation wikis is to the advantage of the latter. The results confirm the consistency and extent of the gender gap in the encyclopedic content available to internet users.
The gender gap in encyclopedia resources is not a recent phenomenon. A number of authors have already written about gender bias in encyclopedias. The renowned archivist and feminist Mary Ritter Beard produced a study in 1942 entitled “A Study of the Encyclopaedia Britannica in Relation to Its Treatment of Women,” in which she and her colleagues strongly emphasized that, in terms of participation in the production of reference works, women were a negligible quantity, unlike the biases and gaps identified in terms of content, which proved to be substantial. Another study undertaken by Gillian Thomas (1992) showed that the biographical treatment of women placed them in the position of extras in a history of human knowledge. Even Marie Curie is generally described as the wife of Pierre Curie. It is fair to say, particularly in the light of the results of this research, that the situation and practices in the field of encyclopedic works have changed little since the days of Beard or Thomas. They appear essentially unchanged to this day, until proven otherwise.
We may acknowledge that Wikipedia continues the tradition of the great encyclopedias that began with Diderot and D’Alembert in the 18th century and that it has not abandoned its ambitious aims and the desire to cover human knowledge exhaustively in an almost utopian project. But it should be noted that this online encyclopedia perpetuates certain gender biases that already existed in these previous works of knowledge. At first glance, these biases exist despite the principle of neutrality that is one of the pillars of the Wikipedia project.
The principle of neutrality is a relevant consideration that is regularly called into question in discussions about the gender gap (McDowell & Vetter, 2021). From the outset, it is important to make clear that the principle of neutrality does not refer to a principle of balance (for example, if there are 50% men and 50% women, there should be 50% biographies for each). The role played by the principle of neutrality in the gender divide is linked to its epistemic function. Like Wikipedia’s five founding principles, which are “fundamental . . . and non-negotiable,” the principle of neutrality constitutes an “intangible” foundation that applies to all articles (“Wikipédia: Neutralité de point de vue,” 2025). According to the principle of neutrality, generally referred to by contributors as “NPOV” (neutral point of view),
Articles should not promote any particular point of view. Sometimes this means mentioning several points of view and representing each of them as faithfully as possible, taking into account their respective importance in the field of knowledge. It also means providing the context needed to understand these points of view according to the sources that support them, and not representing any one point of view as being the truth or the best point of view. These conditions make it possible to verify information by citing sources that are authoritative on the subject (particularly in the case of controversial subjects). (“Wikipédia: Principes fondateurs,” 2025)
This principle aims not to promote subjectivity and is fundamentally concerned with avoiding bias. To do this, this first rule relies on two further rules: verifiability and a ban on publishing unpublished works. These conditions jointly determine the encyclopedic content: “what can or cannot be published in Wikipedia.” The first rule, verifiability, requires that for information to be used, it must be verifiable by readers using a “quality source or reference.” The second rule, which prohibits unpublished work or personal research, specifically excludes “research that has never been published outside Wikipedia or that represents a ‘revolution’ not yet known or debated in the field, an opinion that is ‘excessively’ in the minority or that can only be associated with sources considered confidential and/or unreliable or, more simply, the personal interpretations, deductions or intuitions of the editor of the article” (“Wikipédia: Travaux inédits,” 2025).
These three rules are also considered to be codependent: They “must be interpreted in relation to each other” in order to prescribe admissible content and therefore knowledge or what counts as such within Wikipedia (“Wikipédia: Vérifiabilité,” 2026). However, these three rules, forming a barrier against subjectivity and bias, also tend to significantly disadvantage the admissibility of biographical articles about women for whom, in some cases, verifiable secondary sources are not available or, in others, are insufficient; this further compromises opportunities for narrowing the gender gap. The paradox, which could be called the paradox of Wikipedian bias, stems from the fact that the mechanisms (NPOV, verifiability, no unpublished work) put in place to counter bias in individual contributory practices expose the encyclopedia and make it vulnerable to systemic bias, which cannot be countered even when participants are aware of it and intentionally wish to remedy it. From this point of view, this principle of neutrality seems to be more of an obstacle to the objective of reducing the gender gap.
Moreover, as Tkacz (2014) points out, this conception of neutrality also aims to protect the encyclopedia from struggles in the name of truth: “Neutrality . . . attempts to distance itself from the truth-battles of the outside world, that is, contests of truth that take place outside the Wikipedia formation.” In this respect, Wikipedia differs from historical encyclopedias, which are not immune to the controversies of their times and whose authors assume an authoritative point of view that more often than not reflects a “relatively homogenous” view of the world (Reagle, 2013, quoted in Tkacz, 2014). This epistemic choice in favour of viewpoint neutrality was initially claimed to better support collaboration and build collaborative consensus:
The whole concept of neutral point of view, as I originally envisioned it, was this idea of a social concept, for helping people get along: to avoid or sidestep a lot of philosophical debates. Someone who believes that truth is socially constructed, and somebody who believes that truth is a correspondence to the facts in reality, they can still work together. (Wales, quoted in Reagle, 2010, p. 53)
In this respect, as Tkacz suggests, it is not just particular battles over truth that will have been abandoned but the notion of truth itself.
Although Wikipedia has abandoned the truth of the outside world, the project has not been immune from battles and controversies that go far beyond the debates and deliberations on the construction of knowledge. Within the confines of its internal truth, battles are waged on a daily basis, and they have proven every bit as fierce as the battle from which it likely intended to escape in the first place.11 Many of the outbursts concern biographies of women whose verifiable secondary sources are contested. These controversies are not always resolved in favour of deliberation leading to a collaborative consensus—following from the idealized image suggested by Wales—to prescribe admissible content and therefore knowledge or what counts as such within Wikipedia (Ford, 2015). However, these three rules, forming a barrier against subjectivity and bias, also tend to significantly disadvantage the admissibility of biographical articles about women for whom, in some cases, verifiable secondary sources are not available or, in others, are insufficient; this further compromises opportunities for narrowing the gender gap.
Heather Ford (2015) has studied the dynamics at work in these cases. She has shown that the decision-making process operates in a rhizome, in the peripheries where the power of what will be represented in Wikipedia is held. These Wikipedians do not fit the media image of amateurs; they have specialized knowledge and skills. And those who “understand how to perform and speak according to Wikipedia’s complex technical, symbolic and policy vocabulary tend to prevail” (Ford, 2015, p. 3). Several examples illustrate how these practices are not an advantage when it comes to controversies about women’s biographies. Moreover, as Doutreix (2017) points out, the study of these controversies indicates that they work to the detriment of statements made by experts or activists.
From an intersectional perspective, it can also be maintained that the situation is aggravated when it comes to articles or information affecting women who combine several grounds of oppression, such as race, colour, or ethnic origin. This situation could also affect articles by women of French Québec or French Canadian origin whose content is questioned on the discussion pages by Wikipedians from the majority on en.wikipedia, given that this latter majority favours certain ideologies on gender and also certain sources to the detriment of others who are associated with “peripheral” francophone cultures or who do not have recognized status in the hierarchy of sources, according to the standards accepted by the majority. This hypothesis ties in with Heather Ford’s (2017) findings that these Wikipedians also reaffirm “the already legitimated power of scientists and academic professionals on Wikipedia, but it also introduces new, unequal power relations among editors who do not share the same levels of expertise and access to what are considered credible knowledge sources.”
In other words, the epistemic framework linked to neutrality prevents the production of articles about people identified as women for whom verifiable secondary sources cannot be provided, but it also appears prejudicial in discussions about articles whose verifiable secondary sources could be considered sufficient but that are rejected by Wikipedians who hold power over representations and, consequently, over what is held to be true in Wikipedia.
Before concluding this discussion, we need to make a few more remarks about the limitations of the present study. The fact that the declaration of the Wikidata property element relating to encyclopedias (Q55452870) is voluntary is perhaps a limitation of the tool, since it is possible to categorize resources whose status as encyclopedic publications or reference works could be called into question. This is the case, for example, with the BabelNet aggregator or the Notable Name Database. However, these are the only cases in the sample where their status appears to be open to question.
Furthermore, the synchronic approach used could constitute another limitation of the strategy developed. The basic idea behind our analysis is to highlight the gender gap in terms of the content presented to internet users by the various publications and therefore the gap for which an effort has been made to digitize and make accessible online. It would be very interesting to analyze the data diachronically according to the historical periods of the personalities concerned, especially considering the hypothesis that the gender gap has narrowed in recent decades. The Humaniki tool, for example, makes this differentiation possible.
However, one of the most worrying limitations of our study concerns its bias in favour of a binary conception of gender. By focusing exclusively on the gender divide as a binary category (male/male and female/female), we have neglected the status of nonbinary identities. This conjecture contributes to the systemic bias, discrimination, and invisibilization experienced by people who identify as nonbinary or who do not conform to a binary conception (trans, intersex, agender, two-spirited, gender queer, fluid, etc.). For this reason, the very definition of the gender gap needs to be reevaluated and extended beyond the approach based on the Western binary conception of gender (Metilli & Paolini, 2021).
What Do We Do Now?
We are unable to counteract systemic biases directly in a substantial and radical way because of the epistemic obstacles arising from the NPOV, so what are the possible options to mitigate the gender gap, relatively speaking? As Beaudouin (2020) suggests, the intensive production of women’s biographies that meet the verifiability criterion, which requires all information to be verifiable by a quality source or reference, is one way of reducing the gender gap. Supporting and taking part in initiatives such as Les sans pagEs and the Art + Feminism workshops, held across Québec and Canada, encourage the production of this content. Tens of thousands of articles remain to be created that meet these conditions.
In addition, it may be thought that an editorial project should be set up for quality biographical articles about women, who also present a gender asymmetry on Wikipedia.12 However, for many female personalities, the sources needed to satisfy the verifiability are simply not available. Another option is to produce knowledge upstream, as suggested by Wikimedia France: “Award prizes, draw up portraits, write entries for bibliographic dictionaries, or publish research articles and monographs on little-known women. Wikipedia can then deal with the subject by synthesizing and disseminating knowledge” (Beaudouin, 2020).
From a regional perspective, another action is to increase the creation of links between Canadian, Québec, and French bibliographic resources and Wikidata. For example, the recent addition of 329 entries from the Japanese Canadian Artists Directory to Wikidata shows that 51.4% of the entries are about female personalities.13
We hope that a movement can be launched to link the contents of these reference works—as has already happened in the case of encyclopedias such as Britannica—so that we can better assess the gender gap that exists in the latter and also draw on it for material to contribute to Wikipedia. We are thinking particularly of a mobilization effort around the following Québec encyclopedias: Biographies canadiennes-françaises, the Dictionnaire des auteurs de langue française en Amérique du Nord, the Encyclopédie du patrimoine culturel de l’Amérique française, and those included in our research—that is, the Dictionnaire biographique du Canada and the Encyclopédie canadienne.
Our study leads us to the following conclusion: In the Wikipedian environment and beyond, the problem is not fundamentally related to the size of the gender gap, whether in terms of biographical coverage or editorial participation, but rather in the obvious imbalance linked to systemic sexism. As Jemielniak (2016) suggests, Wikipedia reflects the prejudices and biases present in society; sexism, accentuated by technoculture, proves to be one of the most powerful levers. Research findings such as those presented here can contribute to rendering the gender gap and systemic sexism more explicit and also making them better understood. However, as other studies have pointed out, there is now an urgent and conscious need to look at gender bias in relation to other social biases, using a nonbinary and intersectional approach, as part of future research with the help of new tools but also at inclusive Wikimedia policies in order to reduce epistemic injustices from a broader and more fundamental perspective.
Acknowledgements
We would like to thank Michèle Lefebvre, librarian at Bibliothèque et Archives nationales du Québec (BAnQ), for her help and insights into Canadian reference works. We would also like to thank Wikimedians TomT0m and Nicolas Vigneron, who developed several tools for analyzing wiki elements and articles, and mathematician Maxime André for his help in processing certain data. Finally, we would like to thank the Wikipedian Cantons-de-l’Est and Pierre-Yves Beaudouin for reprinting and disseminating previous work.
References
- Beard, M., Edinger, D., Selig, J. A., & White, M. (1977). A study of the Encyclopaedia Britannica in relation to its treatment of women. In A. J. Lane (Ed.), Mary Ritter Beard: A sourcebook (pp. 215–24). Schocken Books.
- Beaudouin, P.-Y. (2020). Biais de genre: Wikipédia aussi imparfaite que la société. In Wikimédia France. https://www.wikimedia.fr/biais-de-genre-wikipedia-aussi-imparfaite-que-la-societe/
- Cohen, N. (2011, January 30). Wikipedia ponders its gender-skewed contributions. New York Times. https://www.nytimes.com/2011/01/31/business/media/31link.html
- Doutreix, M.-N. (2017). Quel dialogisme dans l’écriture encyclopédique de Wikipédia? In M.-D. Popelard (Ed.), La reprise en actes (pp. 149–59). Presses universitaires de Rennes. https://doi.org/10.4000/books.pur.183897
- Ford, H. (2015). Fact factories: Wikipedia and the power to represent. Kellogg College, University of Oxford. https://doi.org/10.13140/RG.2.1.4068.9361
- Gender gap. (2025, November 18). In Meta-Wikimedia. https://meta.wikimedia.org/w/index.php?title=Gender_gap&oldid=29668667
- Humaniki. (n.d.). Humaniki: Wikimedia diversity data tool. Retrieved January 30, 2026, from https://humaniki.wmcloud.org/
- Jemielniak, D. (2016). Breaking the glass ceiling on Wikipedia. Feminist Review, 113(1), 103–8. https://doi.org/10.1057/fr.2016.9
- Johnson, P. (2018). Fundamentals of collection development and management (4th ed.). ALA Editions.
- Klein, M., & Konieczny, P. (2015). Gender gap through time and space: A journey through Wikipedia biographies and the “WIGI” index. arXiv. https://doi.org/10.48550/arXiv.1502.03086
- Konieczny, P., & Klein, M. (2018). Gender gap through time and space: A journey through Wikipedia biographies via the Wikidata human gender indicator. New Media & Society, 20(12), 4608–33. https://doi.org/10.1177/1461444818779080
- McDowell, Z. J., & Vetter, M. A. (2021). Wikipedia and the representation of reality. Routledge.
- Melançon, B. (2018). Les vertus utopiques de l’encylopédisme. Sens public. https://doi.org/10.7202/1059033ar
- Metilli, D., & Paolini, C. (2021). Non-binary gender identities in Wikidata [Video]. YouTube. WikidataCon 2021. https://www.youtube.com/watch?v=lIsmFUuGvCw
- Nadeau, J.-F. (2017, March 4). L’histoire invisible des femmes. Le Devoir. https://www.ledevoir.com/societe/493173/les-grandes-oubliees-l-histoire-invisible-des-femmes
- Reagle, J. M. (2010). Good faith collaboration: The culture of Wikipedia. MIT Press.
- Reagle, J. (2013). “Free as in sexist?” Open culture and the gender gap. First Monday, 18(1). https://doi.org/10.5210/fm.v18i1.4291
- Reagle, J., & Rhue, L. (2011). Gender bias in Wikipedia and Britannica. International Journal of Communication, 5, 21. https://ijoc.org/index.php/ijoc/article/view/777
- Rey, A. (2007). Miroirs du monde: Une histoire de l’encyclopédisme. Fayard.
- Thomas, Gillian. (1992). A position to command respect: Women and the eleventh Britannica. Scarecrow Press.
- Tkacz, N. (2014). Wikipedia and the politics of openness. University of Chicago Press.
- Villeneuve, S. (2025, November 30). PoV. In Wikipédia. https://fr.wikipedia.org/w/index.php?title=Utilisateur:Simon_Villeneuve/PoV&oldid=231048848
- Wikimedia Foundation. (2023). Community insights 2023 report. Wikimedia Foundation. https://meta.wikimedia.org/wiki/Community_Insights/Community_Insights_2023_Report
- Wikipédia: Neutralité de point de vue. (2025, October 11). In Wikipédia. https://fr.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:Neutralit%C3%A9_de_point_de_vue&oldid=229682203
- Wikipédia: Principes fondateurs. (2025, October 14). In Wikipédia. https://fr.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:Principes_fondateurs&oldid=229755908
- Wikipédia: RAW/2019 03 01. Regards sur l’actualité du mouvement Wikimedia. (2025, March 5). In Wikipédia. https://fr.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:RAW/2019-03-01&oldid=223624218
- Wikipédia: RAW/2021 04 01. Regards sur l’actualité du mouvement Wikimedia. (2025, March 6). In Wikipédia. https://fr.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:RAW/2021-04-01&oldid=223624300
- Wikipédia: Travaux inédits. (2025, October 16). In Wikipédia. https://fr.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:Travaux_in%C3%A9dits&oldid=229815221
- Wikipédia: Vérifiabilité. (2026, January 10). In Wikipédia. https://fr.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:V%C3%A9rifiabilit%C3%A9&oldid=232315309
- Wikipedia: WikiProject Countering systemic bias. (2025, November 25). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Wikipedia:WikiProject_Countering_systemic_bias&oldid=1324167268
1 The Les sans pagEs project is the French-language version of WikiProject Women in Red; the Femmes project corresponds to the WikiProject Gender Studies. The projects WikiProject Countering Systemic Bias / Gender Gap Task Force and WikiProject Countering Systemic Bias have no equivalent in French.
2 The checklist is a mixed approach: Although classified as qualitative, it is a method whose data is primarily collected quantitatively and processed through descriptive and inferential statistical analysis (Johnson, 2018).
3 For example, Gale Biographical Resource Center, Wilson’s Current Biography Illustrated, American National Biography Online, Chambers Biographical Dictionary, and Le top 100 des personnes les plus influentes selon les magazines The Atlantic et Time (Reagle & Rhue, 2011, pp. 1144–45).
4 For example, each entry in the Canadian Encyclopedia (CE) is associated with a unique identifier in the form of a word string. This identifier must be added to the end of the base URL—https://www.thecanadianencyclopedia.ca/en/article/ (English) or https://www.thecanadianencyclopedia.ca/fr/article/ (French)—to access a specific entry. For instance, the French-language article on Émile Nelligan can be accessed by adding the identifier emile-nelligan, resulting in the URL: https://www.thecanadianencyclopedia.ca/fr/article/emile-nelligan.
5 Wikidata’s 125 million elements (Q), including more than 10 million dedicated to humans, are linked by one or more of its 10,000 properties (P) in the form of RDF triplets. Of these properties, more than half are dedicated to UIDs of external publications. As a result, according to the previous example, on Wikidata, property P5395 is dedicated to the unique identifiers of CE entries. This property therefore associates each notion in the CE with the equivalent element in the free knowledge base. For example, the CE entry Émile Nelligan is associated with Wikidata element Q2392492 (element dedicated to the poet) using the property P5395.
6 See, for example, the tens of thousands of items concerning female personalities who do not have a French-language Wikipedia article. These entries were identified during Women’s History Month 2019 (see https://fr.wikipedia.org/wiki/Utilisateur:Simon_Villeneuve/femmes).
7 List of properties: https://w.wiki/37pf.
8 A dump of Wikidata is a version exported from the Wikidata database at a given time and used for research or analysis.
9 See https://qlever.dev/wikidata/ehCg0u for the query. At that time, around 19% of Wikidata’s human elements had no registered gender. This therefore leads to uncertainty about the actual proportion of each gender on this site, although we have no reason to believe that the current sample is not representative of the whole. It should also be noted that over 40 gender identities are used on Wikidata.
10 For example, Gilles Vigneault has a unique identifier on Britannica, but this links to an article on French Canadian literature and not to a biographical article devoted to Vigneault.
11 By “internal truth,” we refer, following Tkacz, to the explicitness of point-of-view neutrality, which serves as a justification framework (“Wikipédia: Neutralité,” 2025).
12 In the French edition of Wikipedia, there are currently 294 M biographical articles and 53 M biographical articles; https://w.wiki/3pYE.
13 The statistics may be accessed by making the following query: https://w.wiki/AQvk.