Afterword
The Value of Verified Knowledge in the Age of Generative AI
Pierre Lévy
Most people, myself included, who use Wikipedia on a nearly daily basis are generally unaware of the importance, both theoretical and practical, of the debates taking place internally among the contributors and touching on essential social, cultural, and epistemological issues. After reading this book on open culture and the Wikimedia movement in Canada, the reader will have come across some fascinating analyses of the complex issues facing Wikipedia’s authors and editors. The reader will also have gained a glimpse of the growing mine of information that is Wikidata, a semantic metadata repository that today serves as the skeleton for numerous symbolic artificial intelligence (AI) systems.1
I would like to start this afterword with a quote from Denny Vrandečić, one of the initiators of Wikidata who had worked on Google’s knowledge graph as an ontologist and is now the leader of the Abstract Wikipedia project, which is aimed at making Wikipedia articles data language independent (i.e., translatable into all languages). Speaking at the Knowledge Graph Conference in May 2023, Denny said, “In a world of infinite content, knowledge becomes valuable” (Vrandečić, 2023). Clearly, this world of potentially infinite content results from the now massive use of generative artificial intelligence. This new situation poses a number of problems. Let’s mention two that are particularly significant from the point of view of access to knowledge. First, despite the uses made of generative models to obtain direct answers quickly, it should be remembered that today’s statistical AI (also known as neural AI) is unlike the classic symbolic AI of the twentieth century: Statistical AI is limited only by its capabilities and offers no guarantee of truth. GPT-4 and other similar models do not constitute knowledge bases. Statistical AI leads to many errors of both fact and reasoning, and we only need to be a specialist in a single field of endeavour to notice the weaknesses of this kind of AI, as we usually do when we read an article involving one of our areas of expertise when it is written by a journalist in a hurry who is content to restate the common beliefs in which they are immersed. ChatGPT’s probabilistic answers are only plausible. Secondly, as generative AI models are trained on web data and this data is increasingly written and illustrated by the models in question, we find ourselves in the presence of a dangerous vicious circle, all the more so as the low-paid workers responsible for aligning the models and correcting their biases or errors are themselves using generative AI to accomplish their task!
This presents us with epistemological quicksand, and the best way to extricate ourselves is to invest more than ever in building reliable sources of information. In other words, the explosion in the use of generative AI doesn’t dispense us from building up and using Wikipedia, Wikidata, and other verified knowledge bases; on the contrary, it makes it even more necessary that we contribute to them and have the pleasure of consulting them! That said, we have seen in the rest of this collective work that neural AI is nonetheless destined to play a positive role in the sharing of knowledge that is so dear to Wikipedians.
I am not a researcher specializing in Wikimedia studies. Instead, I am a philosopher committed to thinking about the digital world. I will confine myself to a few thoughts by way of coda on the triangular relationship between collective intelligence, artificial intelligence, and the uplifting goal of making knowledge available to all.
Collective Intelligence
Collective intelligence processes are narrowly defined as a means of solving problems, and they take many forms, the most widely studied of which are statistical, deliberative, and stigmergic (Baltzersen, 2022; Lévy, 1994).
At the beginning of the twentieth century, the English scientist Francis Galton visited an agricultural fair. A competition was organized with 800 participants, most of them farmers, and they were asked to guess the weight of an ox. But none of them had found the exact weight. Galton took the average of all the estimates and found that it was much closer to the real weight than any of the individual estimates. The “wisdom” of the crowd was thus superior to each of the individual intelligences (Galton, 1907). This form of collective statistical—or accounting—intelligence assumes that individuals do not communicate with each other and do not coordinate in any way. It works all the better if the distribution of choices or predictions is spread over a wide spectrum so that individual errors and biases compensate for each other. Paradoxically, this type of collective intelligence presupposes mutual ignorance. It is expressed in opinion polls or elections, where it is forbidden to communicate partial results before everyone has voted. This approach to statistical collective intelligence, with no connection between its members, was popularized by James Surowiecki in his book The Wisdom of Crowds.
A second form of collective intelligence is deliberative intelligence, and it is based on direct communication between the members of a community. Deliberative intelligence results from the exchange of arguments and points of view. Faced with a common problem, deliberative intelligence can converge on a consensus, or it can be divided between a number of solutions whose pros and cons have been weighed up together. Provided there is a willingness to listen, everyone brings their own point of view and their own particular expertise, and this in turn enriches the general debate (Mulgan, 2017; Zara, 2021). This type of collective intelligence is apparently ideal because it is open and reflexive, yet it is all the more difficult to implement when the community is large. Forms of hierarchy and delegation then need to be established, and while these are certainly essential, they also disrupt the transparency of the collective intelligence itself.
I would now like to introduce a lesser-known form of collective intelligence that is nonetheless at work in many animal societies and has found its highest degree of completion in humanity: stigmergic communication. The Greek etymology explains the meaning of the word stigmergy quite well: Marks (stigma) are left in the environment by the action or work (ergon) of members of a community, and these marks in turn—and recursively—guide their actions (Heylighen, 2016). The classic case of stigmergy is when ants leave a trail of pheromones in their wake as they bring food back to the anthill. The smell of the pheromones encourages other ants to follow in their footsteps to discover the booty and bring food back to their underground city, leaving a scented message on the ground in their wake. Language gives humanity a high degree of collective intelligence, superior to that of other mammals and comparable to that of bees or ants. Like other eusocial species, we communicate largely in a stigmergic way. But instead of marking a physical territory with pheromones or other types of visual, auditory, or olfactory signals, we leave symbolic traces. As culture evolves, signifiers accumulate in increasingly sophisticated external memories: standing stones, totem poles, sculpted landscapes, monuments, architecture, writing signs, archives, libraries, and databases. It could be argued that any form of writing that is not precisely addressed is a form of stigmergic communication: Traces are deposited for future reading and act as the external memory of a community.
The various processes of collective intelligence that have just been mentioned—statistical (without communication), deliberative (with direct communication), and stigmergic (with indirect communication)—are obviously not mutually exclusive. Indeed, they may well succeed one another or combine. For example, Wikipedians coordinate, deliberate, and vote via shared databases.
On the scale of the species as a whole, human collective intelligence is a continuation of animal collective intelligence, but in humans, this collective intelligence is more sophisticated than in other species because of the language, techniques, and political, economic, legal, and other institutions that characterize us. The main difference between animal and human collective intelligence lies in culture. In a diachronic dimension, our species is driven by a speed of learning greater than that of biological evolution. Our know-how is accumulated and passed on from one generation to the next by means of our external memories, sign systems, social conventions, and tools. No individual would be “intelligent” if they did not inherit the knowledge created by their ancestors. In a synchronic dimension, we participate in a coordinated collective intelligence in which the conceptual architecture of our shared memories and the social organization of our communities resonate and revive each other. The reciprocal definition of identities and the recognition of problems are decided at this metalevel of culture. As a result, beyond the useful procedures (stigmergic, statistical, and deliberative) for solving problems, there is a more holistic collective intelligence that circumscribes the cognitive capacities of a society.
Cultural evolution has already crossed several thresholds of collective intelligence. The inventions of writing, printing, and electronic media (music recording, telephone, radio, television) have already irreversibly increased our capacity for memory and social communication. Probably the greatest social change we have experienced in the last 25 years is the emergence of global communication via digital memory. This new form of distributed read-write communication in a collective digital memory represents a far-reaching anthropological transformation. We are immersed in the new digital environment, and we interact through the oceanic mass of data that brings us together. Wikipedia’s encyclopedists and GitHub’s programmers collaborate via the same database. Unbeknown to us, every link we create, every tag or keyword we affix to a piece of information, every act of evaluation or of approval, every “like,” every request, every purchase, every comment, every share we post—all these operations subtly modify the shared memory—in other words, the inextricable magma—of relationships between data. Our online behaviour emits a continuous flow of messages and cues that transform the structure of memory and help direct the attention and activity of our contemporaries. We deposit electronic pheromones in the virtual environment that determine the actions of other internet users in a loop and that train the formal neurons of AIs as well.
Artificial Intelligence as an Enhancement of Collective Intelligence
Let’s now turn to artificial intelligence, but from the angle—which may strike some readers as unusual—of collective intelligence. Journalists and the general public tend to classify applications that are considered advanced at the time they appear as “artificial intelligence.” But a few years after being introduced, these same applications will have become commonplace, everyday occurrences and will more often than not be reinterpreted as belonging to ordinary computing. Since the mid-twentieth century, despite the apocalyptic headlines and images of young women with chromium-plated brains that are supposed to embody artificial intelligence, we have been witnessing a process of formal commodification and externalization of cognitive functions. Increased power and lower hardware costs are distributing these objectified cognitive functions throughout society. Interconnected machines record and retrieve information, perform arithmetic or algebraic calculations, simulate complex phenomena, reason logically, conform to syntax and systems of rules, and extract shapes from entangled statistical distributions. Computers automate and socialize our ability to communicate, our capacity for memory, perception, learning, analysis, and synthesis.
Artificial intelligence, by virtue of its very name, naturally conjures up the idea of an autonomous machine intelligence that sits opposite human intelligence, simulating or surpassing it. But if we look at how AI devices are actually used, we have to admit that, most of the time, they augment, assist, or accompany the operations of human intelligence. Back in the days of expert systems—in the 1980s and 1990s—I observed that once the critical knowledge of specialists within an organization was codified in the form of rules to drive knowledge bases, it could then be made available to the members who needed it most, responding precisely to current situations while remaining continuously available. Rather than supposedly autonomous artificial intelligences, these were media for disseminating practical know-how, the main effect of which was to increase the collective intelligence of the user communities.
In the current phase of AI development, the role of the expert is played by the crowds who produce the data. The role of the cognitive engineer who codifies the knowledge is played by neural networks. Instead of asking linguists how to translate something or authors how to produce a text, statistical models unknowingly interrogate the multitudes of anonymized writers on the web to automatically extract standard structures that no human programmer would have been able to work out. Algorithms are conditioned by their training and can then recognize and reproduce data corresponding to the forms they have learned. But because they have abstracted structures instead of recording everything, they are now capable of correctly conceptualizing forms (of images, text, music, code, etc.) that they have never encountered and of producing an infinite number of new symbolic arrangements. This is why we speak of generative artificial intelligence. Neural AI synthesizes and mobilizes shared memory. Far from being autonomous, it extends and amplifies collective intelligence. Millions of users contribute to perfecting the models by asking them questions and commenting on the answers they receive. Take Midjourney, for example, where users exchange instructions (prompts) and constantly improve their AI skills. Midjourney’s Discord servers are now the most widely distributed on the planet, with over one million users. A new collective stigmergic intelligence is emerging from the fusion of social media, AI, and creative communities.
Contemporary AI thus serves as the conduit for a feedback loop between the shared digital memory and the individual productions that exploit it and accumulate in turn in data centres. We need to get behind the machine and glimpse the collective intelligence that it reifies and mobilizes.
Sharing Knowledge: Toward Neurosymbolic Collective Intelligence
The collective intelligence currently supported by artificial intelligence is still only partial. In fact, the use of internet data to train models mobilizes collective stigmergic intelligence (the feedback loop between individual behaviour and shared memory) and statistical intelligence (neural learning). In the early 2020s, the connection between and mutual reinforcement of these two forms of collective intelligence by the new AI devices provoked an intellectual shock—and strong emotions—in people who gained a glimpse of how powerful they could be. But a deliberative and reflexive collective intelligence was still missing. At the scale at which we are operating, this deliberative collective intelligence must relate to the organization of data—that is, the conceptual structure of memory, inevitably coupled with the practices of communities. How can we ensure that the networks of concepts that inform digital memory are subjected to an open, transparent conversation that is attentive to the consequences of the choices digital memory makes? The semantic web and its panoply of standards (XML, RDF, OWL, SPARQL) have certainly established format interoperability, but they do not provide the proper semantic interoperability—that of concept architectures—that we need. The web giants have their knowledge graphs, but unfortunately, these are private and secret. Wikidata offers an example of an open-knowledge graph, but it is still very difficult for the general public to explore and use this on a daily basis. What’s more, it is presented as a stand-alone ontology, that of the Wikipedia encyclopedia, whereas we need to harmonize and dialogue with the multitude of ontologies emerging from practices as diverse as we like.
I invented IEML (Information Economy MetaLanguage) to solve this problem of the emergence of a deliberative (or reflexive) collective intelligence in digital form. IEML is an artificial language with a regular algebraic structure, whose semantics can be calculated and that can say anything and translate any concept network (Lévy, 2023). IEML is an open-source language designed with a view to increasing the knowledge commons, and its development must be subject to decentralized governance. However heterogeneous or diverse it may be, IEML projects ontologies, knowledge graphs, collections of labels, and data models onto the same semantic coordinate system: a virtually infinite universe of conceptual differences that algorithms can use. IEML can serve as a pivotal language between natural languages, between humans and machines, and between AI models.
It goes without saying that most of IEML’s users will not have to learn this metalanguage, since the application interfaces, including the editor themselves, will be in natural languages or in iconic form. The “code” side of IEML is intended for computers only. We can therefore envisage a multitude of knowledge bases with singular conceptual architectures being able to exchange ontological modules and information thanks to the semantic interoperability provided by this common metadata language.
Let’s return now to what unites all Wikipedians—namely, the authors, editors, and readers of the movement. Not the “encyclopedia” object, which is ultimately no more than a particular means adapted to the technical and cultural possibilities of an era, but the much broader aim, which resonates into an as-yet-unimaginable future that will consist of making knowledge available to everyone. This concept can be broken down into two exercises: (1) allowing all knowledge to be expressed, accumulated, and communicated and (2) facilitating the exploration and appropriation of knowledge based on the wide range of practical situations, learning paths, and cognitive styles. We can see the affinity of the Wikipedia ideal with that of collective intelligence, which is diametrically opposed to “groupthink” and aims to maximize creative freedom and collaborative efficiency simultaneously.
The philosopher in me will be forgiven for evoking a concrete utopian vision that is no doubt technically feasible but in the short term aims first and foremost to get people to think. So let’s imagine a system for sharing knowledge that makes the most of contemporary technical possibilities. At the heart of this system is an open ecosystem of knowledge bases categorized as IEMLs, which emerge from a multitude of communities of research and practice. Between this core of interoperable knowledge bases and living human users lies a “no-code” neural interface (an ecosystem of models) that provides access to data control, supply, exploration, and analysis. Everything happens intuitively and directly, in line with the sensory-motor modalities selected. It is also through this gigaperceptron—an immersive, social, and generative metaverse—that groups exchange and discuss the data models and semantic networks that organize their memories. This new knowledge-sharing system is a good knowledge management tool that encourages the recording of creations, supports learning paths, and presents useful information to those involved in their practices.
On the shared side, each knowledge base, whether personal or collective, displays its universe of discourse, its data, and its statistics, which is as transparent for algorithms as it is for human eyes. On the private side, however, our knowledge-sharing system ensures that individuals and groups have practical and legal sovereignty over the data they produce, which they disclose only to select players.
The decisive increase in the deliberative dimension of collective intelligence through the use of a common metadata language has multiplier effects on the statistical and stigmergic collective intelligences already at work in our day. A new neurosymbolic infrastructure will plunge the collective intelligence of the future into the explorable universe emanating from its own cognitive activities. However, we need to distinguish between the collective intelligence that drives living human individuals and communities and the mechanical extensions and media representations that augment it. Let’s not turn artificial intelligence into an idol.
Quoting Ibn Rushd (the Averroes of the Latins), Dante writes in chapter I, 3 of the Monarchy,
The highest potentiality of mankind is his intellectual potentiality or faculty. And since that potentiality cannot be fully actualized all at once in any one individual or in any one of the particular social groupings enumerated above, there must needs be a vast number of individual people in the human race, through whom the whole of this potentiality can be actualized. (Alighieri, 1996)
Let this vast number of individual people become transparent to itself in the new algorithmic medium, and we will have moved from the anthill to the city.
References
- Alighieri, D. (1996). Monarchy (P. Shaw, Trans.). Cambridge University Press.
- Baltzersen, R. K. (2022). Cultural-historical perspectives on collective intelligence. Cambridge University Press.
- Galton, F. (1907). Vox populi. Nature, 75, 450–51. https://www.romolocapuano.com/wp-content/uploads/2023/10/Francis-Galton_Vox_populi.pdf
- Heylighen, F. (2016). Stigmergy as a universal coordination mechanism II: Varieties and evolution. Cognitive Systems Research, 38, 50–59. https://doi.org/10.1016/j.cogsys.2015.12.007
- Lévy, P. (1994). L’intelligence collective: Pour une anthropologie du cyberspace. La Découverte.
- Lévy, P. (2023). Calculer la sémantique avec le langage IEML. Humanités numériques, 8. https://doi.org/10.4000/revuehn.3836
- Mulgan, G. (2017). Big mind: How collective intelligence can change our world. Princeton University Press.
- Vrandečić, D. (2023, July 19). The future of knowledge graphs in a world of LLMs [Video]. YouTube. https://www.youtube.com/watch?v=ww99npDh4cg
- Zara, O. (2021). Le chef parle toujours en dernier: Manifeste de l’intelligence collective. Axiopole.
1 Symbolic AI uses explicit logical rules and semantic networks (rather than statistical methods).