What We Do — Search Prompt Integrity & Learning Lab

The Problem With AI Overviews

Problem

Search engine result pages are increasingly being co-opted by LLM-generated search overviews that summarize results. Researchers have found that when search engine users encounter LLM-generated search summaries, they are consistently less likely to visit the actual pages returned in the results. As individuals' search experiences narrow to just the search summary, which often amplifies and/or legitimizes its aggregated information, SPILL seeks to identify the sources from which the model obtains information; what frames they use in presenting the information; and how those characteristics may differ across languages or cultures.

Project

Search Engine Users & Web Traffic

Google's AI Search Results Love to Refer You Back to Google

Along with collaborators at the University of Tuebingen, Carnegie Mellon University, the University of Exeter, and the University of Pennsylvania, SPILL is analyzing Google's AI search overviews to queries about sensitive topics in assessing the information access, equity, and integrity of its responses.

In Phase One, we currently are investigating how overviews differ in sourcing and framing information about a politically sensitive topic across languages (i.e. English, Spanish, German, and Portuguese) and geographic contexts (i.e. North America, South America, Europe). In Phase Two, we intend to expand to include additional languages and potentially, geolocations.

Project Lead: Anna Beers, University of Tuebingen

Wikipedia and LLMs

Problem

With its extraordinary accessibility, Wikipedia has become a primary source of information for AI developers. This raises critical questions regarding information provenance, reliability, and the entrenchment of existing power dynamics or inaccuracies between platforms; specifically, how can absences or inaccuracies in Wikipedia spill over into artificial intelligence systems? Wikipedia has begun reckoning with this question by formally prohibiting specific uses of AI in article editing, although enforcement mechanisms are unclear. AI may therefore be generating content that later becomes part of its own training corpus – exacerbating the threat of “model collapse," in which repeated training on AI-generated text diminishes model performance (Shumailov et al., 2024). As a result, SPILL is interested in exploring how Wikipedia editors engage with genAI in the editorial process, with a special focus on how it shapes considerations of notability.

Project

Wikipedia Bans AI-Generated Content In Its Online Encyclopedia

AI Models Collapse When Trained on Recursively Generated Data

In collaboration with partners at the University of Exeter (and funded by the UK AHRC's Bridging Responsible AI Divides grant), SPILL is working on deeply and meaningfully engaging with Wikipedia editors to understand their relationship with and use of generative AI.

To further contribute to and collaborate with the Wikipedia community, we are planning to host a public edit-a-thon event at the University of Exeter (2027), centered on discussing the relationship between GenAI and Wikipedia. For instance, how might GenAI use amplify or reduce gender/racial biases as they manifest in Wikipedian "notability"?

Project Lead: Patrick Gildersleve, University of Exeter

Wikipedia and Biographical Visibility

Problem

Wikipedia is also the largest reference website, attracting over 1 billion unique visitors each month. However, in conversation with Wikipedia editors of the WikiProject Women in Red, it became evident that many notable women who appear on Wikipedia do not have their images accompanying their biographies – and due to the particularities of SERP, women’s biographies and profiles are consequently less prominent in search engines. This contributes to deeper equalities in fields such as higher education and indicates a significant need to increase article visibility for greater impact. Simultaneously, existing scholarship recognizes that women and marginalized populations face greater risks when increasing their visibility (Gosse et al., 2021).

Project

Women in Red WikiProject

The Hidden Costs of Connectivity: Nature and Effects of Scholars' Online Harassment

Partnering with the Citizens and Technology (CAT) Lab at Cornell University, we surveyed a representative sample of academics who have (or could have) Wikipedia pages about how they conceptualize possible benefits and risks associated with increased visibility. Specifically, do they want a Wikipedia page? Do they want a photo featured?

Our next steps constitute collecting photos from surveyed academics who would like a Wikipedia photo; assisting them with adding those photos to their page; and testing the impact of those photos on the visibility of biographies.

Project Lead(s): J. Nathan Matias & Sarah Gilbert, CATLab (Cornell University)

The Problem With AI Overviews

Problem

Project

Wikipedia and LLMs

Problem

Project

Wikipedia and Biographical Visibility

Problem

Project

Search Prompt Integrity & Learning Lab

Contact