We Found a $20 Million Error in a City Database.
Two official city websites offer troves of campaign finance data for San Francisco elections: SF OpenData (data.sfgov.org) and the Ethics Commission (sfethics.org).
But all that information is useful only if you know how to sort through it. And if you know it was entered correctly.
The Public Press spent six months digging and sorting, and many hours talking with staff at the Ethics Commission for clarification on the best ways to find and distill the information on the 2015 elections.
Campaigns file reports electronically via the Ethics Commission’s NetFile system, which is used by several California cities and counties. OpenData, the central clearinghouse for data published by the city and county of San Francisco, scrapes it daily for updates.
The information is constantly revised, making summaries of past reporting periods a moving target. Campaigns often revise their statements, so the data are accurate only on the day they are viewed. As an election approaches, figures reported change more frequently.
We discovered also that looking through the data for big-picture trends was not easy. For example, the data are sortable by year (but not by election date) and then only after properly setting a half-dozen search filters.
An even bigger challenge is that OpenData relies on data uploaded by campaign committees, so the information is only as accurate as the human beings entering it. The NetFile system does have some fail-safe measures that align with — and are automatically checked against — state-level standards. If the system catches an error, the filing is rejected. However, it doesn’t eliminate the potential for human error.
Several months into our research, we found a huge discrepancy. Data downloaded from NetFile, which is privately held, showed $25.6 million in expenditures in 1998 — the first year the Ethics Commission collected records electronically. But when we checked for the same information at OpenData, the figure was $5.8 million.
Informed of the $20 million error — $19,981,008 to be exact — Ethics Commission staff pointed to a single expenditure report that was recorded in NetFile. While it may be impossible to know exactly how the error occurred, it appeared that someone may have entered a date — Oct. 8, 1998, or “1998 10 08” — in the expenditure field.
It was a glaring confirmation of the potential for errors. Notably, any previous research that examined trends over nearly two decades of data might be inaccurate.
Some contextual information, often called metadata, also is missing. In addition to reporting total money spent, campaigns have to report what it was spent on. There are 27 three-letter codes for various campaign expenditures. But sometimes these codes are filed under different columns. This makes it difficult to calculate the total amounts spent, such as for TV ads.
There are also instances where a single report is tagged with multiple codes, making it impossible to know exactly how many dollars went to each type of expenditure. While fact-checking late in the project, the Public Press also found filings with up to five codes listed on a single expenditure report. Checking all the expense declarations from tens of thousands of individual PDF documents to capture all of this information for more than 15 years’ worth of data would be a massive, labor-intensive project.
Another problem: Not all campaigns fill in the “election date” box in electronically filed reports. As a result, sorting by election date would potentially cause data to be excluded, skewing the results.
One work-around is to sort by “reporting period,” which is always filled in. But there is often more than just one election per year. (The task was simplified, though not foolproof, for 2015 because there was only one election.)
Ultimately, we cannot be sure what else might have been missed because of inadvertent input error or even possible deliberate miscoding to obscure contributions.
The Ethics Commission has corrected many of these problems for specific races going back to 2011. The results can be viewed through graphical dashboards on its website. But searching for information the commission did not visualize, or simply fact-checking the information presented online, is still a messy affair.
Before 2008, committees that e-filed used a web form provided by the city, but the system was very limited; it accepted only filings for the Form 460, which is used to track expenditures and contributions. Basically, it was a data dump — the web system did not calculate or summarize figures automatically, such as cumulative totals, which NetFile does.
Implemented in 2008, NetFile has gone through many updates. The commission envisions making it even more robust over the coming two fiscal years, including improving the metadata that are recorded so that searching the filings can be done more efficiently.
For fiscal 2017, which began July 1, the Ethics Commission was slated to receive additional money to pay for investigators and compliance staff in its $3.3 million budget, plus funding for an E-Filing Conversion Project. That may mean the agency will be able to perform deeper audits, a major step toward more transparency and usability for ambitious, DIY voters.
Find the visualized data online at sfpublicpress.org/costofvotes/data