
Philip Stark with his ballot boxes.
Cyrus Farivar
Today is Election Day in the United States, so we’re resurfacing this story about checking election results that originally ran in 2012.
NAPA, CALIFORNIA—Armed with a set of 10-sided dice (we’ll get to that in a minute), an online web tool, and a stack of hundreds of ballots, University of California-Berkeley statistics professor Philip Stark last Friday unleashed both science and technology in a recent California election. He wanted to answer a very simple question – had the vote count produced the correct result? — and he had developed a statistics-based system to find out.
On June 2, 6,573 citizens went to the polls in Napa County and cast their primary vote for supervisor of the 2nd District in one of California’s most famous wine-producing regions, on the northern edge of the San Francisco Bay Area. The three candidates – Juliana Inman, Mark van Gorder and Mark Luce – would have liked to come in first, but they Real didn’t want to be third. That’s because only the top two voters in the primary would advance to the runoff in November; number three was out.
Napa County officials announced the official results a few days later: Luce, the incumbent, received 2,806 votes, van Gorder received 1,911 votes, and Inman received 1,856 votes—a difference between second and third place of just 55 votes. Given the narrow result, even a small number of counting errors could have affected the election.
Vote counting can go wrong in all sorts of ways, and even the auditing processes designed to ensure the integrity of close races can be a mess (did someone say “hanging, dimples, or pregnant chads”?). Measuring human intent at the ballot box can be tricky. To name just one example: in California, many votes are cast by filling in an arrow, which is then read optically. While voters are instructed to fully complete the thickness of the arrow, in practice some merely draw a line. The vote counting systems used by counties sometimes don’t always count those as votes.
So Napa County invited Philip Stark to take a closer look at their results. Stark has been on a four-year mission to encourage more election officials to use statistical tools to ensure that the declared winner did indeed win. He first described his method in 2008, in an article titled “Conservative Post-Election Statistical Auditing,” but he generally uses a catchier name for the process: “risk mitigation auditing.”
Napa County had no reason to believe that the results of this particular election were wrong, John Tuteur, the county judge, explained when I came to check. But ahead of the close election, Tuteur had asked if Napa County would be the latest participant in a state-sponsored pilot project to monitor several elections in the Golden State.
While American public policy, particularly since the 2000s Bush vs. Gore debacle, has focused on voice technology, not as much attention has been paid to voice audits. If things continue to move forward, Stark could have a disproportionate effect on how election audits are conducted in California, and perhaps the country, in the years to come.
“What this new audit method does is count enough to have a lot of confidence in it [a full recount] wouldn’t change the answer,’ Stark explained to me. “You can think of this as an intelligent recount. It stops as soon as it becomes clear that it is pointless to continue. It provides stronger evidence that the outcome is good.”
The process has been endorsed by numerous academics and voting officials in recent years and by the American Statistical Association (PDF), the League of Women Voters (PDF), the Brennan Center for Justice (PDF), and many others.
And it starts with those 10-sided dice.

A ballot paper from the audit; note the use of a thin connecting line.
Cyrus Farivar
audit day
To kick-start the process, all 6,573 votes counted in the 2nd District Supervisor Contest were rescanned by county election officials in the City of Napa. They sent the scans to a separate computer science team at Berkeley, led by Professor David Wagner. Together with a group of graduate students, Wagner has developed software intended to read voters’ intent from ballots. For example, his system even marks ballots where the arrow has not been filled in as instructed, and it filters out stray marks in a different way. The Wagner team created a spreadsheet of each ballot (they also created a numbering system to identify and locate individual ballots) and how that person cast his or her vote.
One problem that emerged early on was the discrepancy between the number of votes cast and the number of ballots scanned. While a total of 6,573 votes were recorded in this particular contest, the Wagner team scanned a total of 6,809 ballots, while Napa County recorded 7,116 votes cast in the election as a whole. (Not every voter in the election chose to vote in this particular contest.) In short, more than 300 ballots were missing. While that may seem problematic, margins remained more or less the same.
“If both systems say ‘Abraham Lincoln won,’ then if the unofficial system is correct, so is the official system, even if their total votes differ and even if they interpret each vote differently,” Stark wrote in an e-mail Tuesday. mail. “That’s the transitive idea. A transitive audit really only checks who won, not whether the official voting system counted a given vote correctly. That said, we compare police totals for the two systems to make sure that they agree (about) , what they did here.”
He added that in order to address the missing ballots, in order to confirm the winner, he treated them as if they were votes for second place – so even with an extra 300 votes, Luce was still victorious.
“To confirm the runner-up, we couldn’t do that; instead I treated them in two different ways, neither of which was completely rigorous,” he added. “In other audits, I was able to address any discrepancy between ballots with complete rigor, so that the probability of a full hand count if the reported result was wrong remained over 90 percent.”
With that out of the way, the first step in the actual audit was to randomly select a seed number that would be used to feed a pseudo-random number generator found on a website Stark created. For this, Stark had some high-level help in the form of Ron Rivest, one of America’s foremost experts on cryptography and voting systems, a computer science professor at MIT who had also helped create the RSA crypto algorithm. Using 20 store-bought ten-sided dice, Rivest and Stark rolled out a 20-digit number. (73567556725160627585, for those who keep score at home.)
Mitigating auditing is based on a published statistical formula, based on an accepted risk limit and on the profit margin to determine how many randomly selected ballots should be manually checked.
“The risk limit is not the probability that the outcome (after auditing) is wrong,” Stark wrote in an article (PDF) published in March 2012. hand count that does not match the original outcome. A risk mitigation audit can therefore not detract from correct outcomes. But if the original outcome is wrong, the audit may not correct it. The risk limit is the highest probability. If the risk limit is 10 percent and the outcome is wrong, there is at most a 10 percent chance (and typically much smaller) that the audit will not correct the outcome – at least a 90 percent chance (and typically much more) that the audit will correct the outcome. ”

Cyrus Farivar
To decide how many ballots to run in the Napa County audit, Stark used his own online tools and calculated that there should be 559. With that number in hand, John Tuteur of Napa County oversaw a team of temporary ballots in another room. They sorted stacks of ballots into numbered boxes and affixed a sticky note to the individual ballots in question, keeping the order in which all of the ballots were kept.
After finding the individual ballots, the team returned the boxes containing them to Stark, Rivest, and a few observers (including myself). Each marked ballot was then removed from its box and displayed in the room. Once everyone agreed that the ballot showed a vote for a particular candidate, a bottom vote (e.g. no vote at all), or an overvote (an uncounted and unauthorized vote for multiple candidates), the result was added up on Wagner’s spreadsheet . After a certain set of ballots, those results were then compared to what Wagner’s image scanning team had captured.
“You want cast as intended, and counted as cast, and verified,” Stark said.