A generation of Americans is about to do something unprecedented. They will open a chat window, type a question about an election, and treat the answer as fact. They will ask where to vote, whether a candidate’s statement is true, or which ballot measure does what. The tools they will use — ChatGPT, Claude, Gemini, and Grok — have never been reliable at these tasks. Published research confirms this. The 2026 US midterms arrive in about 167 days from the time of this writing, and the evidence is clear: these systems are not ready for what voters will ask them. Below are five specific signs that current AI chatbots fall short when it comes to ai election accuracy.

What the Research Reveals About Chatbot Reliability
In spring 2024, a Tow Center researcher at Columbia Journalism School designed a straightforward test. The team chose 200 news articles from 20 different publishers. They fed these articles to eight AI search products, including ChatGPT Search, Perplexity, Gemini, Copilot, and the Grok-2 and Grok-3 search modes. Then they asked each tool to identify the article and name its original source.
The results were stark. Across 1,600 total queries, the models delivered the wrong answer more than 60 percent of the time. ChatGPT Search, which agreed to answer all 200 queries, was completely accurate on only 28 percent of them. It was completely wrong on 57 percent. Perplexity, often marketed as the research-grade option, posted the lowest failure rate in the group at 37 percent — still far too high for an election context.
Those numbers were published over a year ago. They have not meaningfully improved. A Bloomberg study summary released on May 20 confirmed that ChatGPT, Claude, Gemini, and Grok remain unreliable when asked about news, including election news. Nieman Lab’s analysis of the same data found that ChatGPT continues to be the worst of the four at crediting the news outlets it draws from. A separate NewsGuard False Claims monitor tracked the top ten generative-AI chatbots and found they returned false claims to news prompts 35 percent of the time in August 2025, up from 18 percent the year before. The trend is moving in the wrong direction.
Sign One: Chatbots Cannot Reliably Identify or Credit News Sources
The Tow Center experiment laid this bare. When researchers asked AI search tools to name the publication behind a given article, the models frequently guessed wrong. They confused one legitimate outlet with another. They credited syndicated copies instead of original reporting. They invented publication names that do not exist.
This is not a minor glitch. For a voter trying to verify a claim, knowing where information comes from is the first step toward evaluating its trustworthiness. If the chatbot cannot tell the difference between a Reuters wire and a content-farm rewrite, the voter has no way to gauge reliability. The problem becomes acute when the chatbot cites a source that looks real but is not. A voter who receives a confident answer with a plausible-looking citation has little reason to dig further.
The ai election accuracy risk here is obvious. Elections generate a flood of claims, rebuttals, endorsements, and attack ads. A chatbot that misattributes a quote to the wrong candidate or the wrong publication can distort a voter’s understanding before the voter ever reads the original text. The citation is supposed to be the safety net. When the citation is itself fabricated, the net has a hole in it.
Sign Two: Models Cannot Distinguish Legitimate News From Disinformation
NewsGuard’s tracking of Russian disinformation operations found that top generative-AI models mimicked Moscow-seeded fake-news claims roughly a third of the time. When asked about topics targeted by these operations, the chatbots repeated the false claims and cited the seeded disinformation sites as authoritative sources.
The mechanism behind this failure is not a mystery. The training-data pipelines that produce frontier models have ingested the open web at enormous scale. That ingestion includes the New York Times. It also includes the laundered output of disinformation operations. The models do not have a built-in filter that separates one from the other. They learn patterns from both.
Retrieval-augmented-generation systems sit on top of these models and are meant to ground answers in current, credible sources. But those RAG systems run over a search index whose top results on many news queries are themselves AI-generated rewrites of AI-generated rewrites. The chain back to original reporting grows longer and weaker with each hop. A ‘data voids’ analysis in Lawfare earlier this year described how propaganda fills gaps where real stories have thin original-source coverage. The chatbot, following its retrieval logs faithfully, treats the propaganda as the substantive source because no better alternative ranks higher.
For a voter in the 2026 midterms, this means asking a chatbot about a contested race or a controversial policy could produce an answer that sounds confident, reads fluently, cites multiple sources, and is entirely wrong. The voter has no way to know that the sources cited are themselves disinformation dressed in syndication wrappers.
Sign Three: False Claims Are Increasing, Not Decreasing
The NewsGuard False Claims monitor provides a year-over-year comparison that should alarm anyone who believes the technology is improving. In August 2024, the top ten generative-AI chatbots returned false claims to news prompts 18 percent of the time. By August 2025, that figure had nearly doubled to 35 percent.
This is the opposite of what responsible deployment should look like. If the labs were actively working on ai election accuracy, the rate of false claims should be dropping as training methods improve, retrieval pipelines are refined, and publisher partnerships take effect. Instead, the rate is climbing. The chatbots are getting worse at answering truthfully, not better.
Several factors likely contribute to this trend. The volume of AI-generated content on the web has exploded. Chatbots trained on newer data ingest more of that synthetic content, creating a feedback loop in which model outputs become inputs for the next generation. The retrieval index fills with rewritten versions of rewritten versions, and the original article becomes harder to find. Disinformation operators have also learned how to game the system. They produce content designed to rank highly in search indexes and to match the linguistic patterns that chatbots favor. The models, optimized to retrieve what looks most relevant, retrieve the planted material.
The timing could not be worse. The 2026 midterms will be the first major US election in which a significant cohort of voters may use a chatbot as their primary information interface. If the false-claim rate continues its upward trajectory between now and November, the consequences for informed voting could be severe.
Sign Four: Publisher Licensing Deals Do Not Solve the Retrieval Problem
OpenAI has signed licensing agreements with the Financial Times, Axel Springer, News Corp, Le Monde, and a roster of other publishers. Google, Anthropic, and Perplexity have built similar partnerships. These deals are often presented as a solution to the accuracy problem: if the chatbot has licensed access to reliable content, it should draw from that content and answer correctly.
You may also enjoy reading: FleetWave Outage Takes Another Turn: Crooks Accessed Customer Data.
The evidence suggests otherwise. The Tow Center experiment included articles from publishers with licensing relationships. ChatGPT Search’s 57 percent complete-failure rate was measured on a corpus that included those very articles. Licensing did not produce accurate retrieval. It produced the appearance of legitimacy around inaccurate retrieval.
The structural reason is that a licensing agreement is a legal and commercial arrangement. It does not change how the retrieval system ranks results. It does not teach the model to prefer a licensed source over a syndicated copy. It does not prevent the chatbot from citing a content-farm rewrite that sits higher in the search index than the original licensed article. The chatbot does not know which sources are licensed. It only knows which sources rank highest at the moment of retrieval.
For a voter, this creates a dangerous illusion. When a chatbot cites a well-known publication like the Financial Times, the voter assumes the information has been vetted. But the chatbot may have retrieved the quote from a third-party site that republished the article with errors, or it may have fabricated the citation entirely while correctly naming a licensed publisher. The voter has no practical way to distinguish a real citation from a plausible-sounding fake.
The ai election accuracy problem is therefore not solved by signing more publisher deals. It is a retrieval and ranking problem. Until the retrieval systems consistently prefer original, authoritative sources over derivative or deceptive ones, the licensing agreements are window dressing.
Sign Five: Chatbot Failure Modes Mirror the Structure of Election Misinformation
This final sign is perhaps the most concerning. The specific ways in which chatbots fail map almost perfectly onto the tactics used by election misinformation campaigns. The alignment is not intentional. It is a structural byproduct of how these systems are built and trained.
Misinformation campaigns rely on misattribution. They take a quote from one person and assign it to someone else. Chatbots misattribute quotes systematically. They invent links that resolve to nothing. They collapse distinct sources into a single generic citation. These are not random hallucinations. They are predictable failure patterns.
Misinformation campaigns exploit data voids. They flood the zone where original reporting is thin. Chatbots, by design, retrieve what is most available. When the available material is a disinformation site, the chatbot retrieves it. The Lawfare analysis described this mechanism clearly: thin original coverage plus heavy propaganda coverage equals a chatbot that treats propaganda as fact.
Misinformation campaigns mimic legitimate formats. They use syndication wrappers that look like Reuters or the Associated Press. Chatbots cannot reliably distinguish between a genuine syndicated wire and a look-alike. NewsGuard’s tracking showed the top ten models repeating Russian disinformation claims roughly a third of the time, citing the imitation sites as if they were real.
A voter who asks a chatbot where their polling place is located will receive a confident answer with a verisimilar-looking citation. But the answer may be drawn from an outdated dataset, a scraped third-party site, or a planted disinformation page. The voter has no reason to doubt it because the chatbot sounds authoritative. The failure mode is calibrated to produce exactly this outcome — a confident, fluent, wrong answer that looks right.
The 2026 midterms will test whether this matters in practice. The first cohort of American voters who may plausibly use a chatbot as their primary news interface will go to the polls in November. NOTUS reporting on the campaigns has been blunt: ChatGPT and Claude will be a force in this election, and nobody, including the labs that built them, has a defensible plan for what happens when those forces produce confident, eloquent, well-cited answers that are also wrong.
The signs are visible now. The research is published. The data is consistent. The question is whether voters, campaigns, and the platforms themselves will take these warnings seriously before election day arrives.






