The government doesn’t look good naked.

A chubby toddler naked from the waist up.
A chubby toddler naked from the waist up.
Another 19 month old who I won't call ugly. (Courtesy mpisti on flickr, licensed CC-BY-NC-SA)

So 19 months into the Open Government Directive, we seem to have a backlash. The government has spent millions of dollars collecting, organizing, and cataloging its data to make it more available to the public. An unprecedented effort. Some of this data is frivolous, some of it is valuable, but I think we can all agree that more transparency is always — always — a good thing.

Not so, says Ellen Miller of the Sunlight Foundation, one of the leading advocates for government transparency. On Tuesday at the Gov 2.0 Summit, she made it clear that transparency wasn’t enough. She also wants accuracy, relevance and quality in the data. Instead, Sunlight found $1.3 trillion in inaccuracies on She’s also got some choice words for and other Open Government initiatives. The keynote was a remarkable turn: the administration was completely eviscerated by one of its closest allies. Today, I read that Fast Company’s Austin Carr is similarly disillusioned by this week’s announcement of I think it’s safe to say there will be more pieces like this in the next few months.

I love quality data as much as the next person, but this is a perfect example of treating government as a vending machine, and it’s poisonous. In 19 months, citizens have access to more data than they ever had before. In some agencies, it takes an average of 43 months to get a new project off the ground. The fact that the US government is even attempting this is amazing.

As I mentioned last year, this is exactly how to prevent innovation in government. If you want change, you have to tolerate imperfection and risk. If every program manager thinks they’ll end up on the front page of the Washington Post or get dressed down onstage at Gov 2.0, nothing will change.

Now, some of the data is ridiculous, yes. But now we know it’s ridiculous. Before some of this data was public, nobody knew it existed. The government employees who worked with it probably assumed it was valid. One of the main reasons to release the data is to permit public scrutiny and that’s exactly what we got. Ironically, Sunlight’s $1.3 trillion discovery is an example of the process working, not a failure. This isn’t a case of Greek-style institutionalized malfeasance. You’re just seeing how hard it is for one of the largest, most complicated organizations on the planet to keep its records straight. I’m not surprised by it at all.

Sunlight has, I think, dangerously conflated transparency for reform. You get transparency first, and that compels reform. That’s the whole point. You don’t ask for perfection right out of the gate, it’s unreasonable. Red Hat’s CEO, Jim Whitehurst, is fond of saying “if everyone walked around naked, we’d all go to the gym more often.” So the government’s naked, and it’s gross. The solution isn’t to tell the government that it’s gross. It knows it’s gross. The solution is a long, difficult, complicated, and unpleasant series of reforms that produce better quality data. That requires patience, diligence, perseverance. From both sides.

[Update: Tom Lee of Sunlight responded to this, and I’ve responded in turn. We’re all still friends.]


  1. Wow Gunnar – so right on target it isn’t even funny.

    Taking a slight cue from this to the current ‘angry debate’ that everyone assumes you and I are having over (clue for everyone – Gunnar and I actually agree on more things than we disagree on!), I think your words here are instructive to the whole ‘government (data/software/you name it) has to be all OPEN – no, government has to be all CLOSED’ debate.

    As someone with an engineering mindset, I’d rather we have everything working exactly the way we want (open source government software/data transparency with accuracy, etc). However, in transitioning from a ‘pure’ engineering mindset to someone who has to deal with the pragmatics and daily reality of life in the government or enterprise, I’ve come to accept that you’ll never get very far in effecting change if you bang the drum too hard for either extreme.

    I know the Sunlight Foundation (and other people of similar passion around other tech change efforts) think that by haranguing the ‘bad’ government they will drive change. The reality is that, government or enterprise is made up of individuals of all stripes and passions. Attempting to ‘out’ them, as you point out, just causes people to dig in their heels.

    Having realistic expectations and holding institutions accountable in a respectful manner is the key to the way forward here.

    Thanks for a great post!

  2. Great post.Unfortunately, your completely reasonable perspective is of interest to practically no one in this with-us-or-against-us, demand-for-instant-gratification political junior high school that our country seems to have turned into.

    (Came over here from Weigel’s blog, if you’re wondering.)

      1. I suppose that’s where we differ. I see nothing wrong with saying “thanks, now do better.” Either way, appreciate your insight. This is a great conversation to have.

  3. I think Ellen Miller was pointing out that much of the “open data” success propaganda isn’t true and that continuing to blindly dump raw data into the public domain isn’t the answer we are looking for in making the government more accountable. What has all of these “challenges” and “democracy prises” actually produced other then good press releases and helped get access to data for the large search engine companies to harvest?

    1. As I mentioned, I think we’re already seeing the benefits of more open data and transparency. Sunlight’s own $1.3 trillion statistic would have been impossible without it. There are now many developers working with government data and software. These are all great developments that would have been otherwise unthinkable before the Open Government Directive was issued. Just look at all the great work at Sunlight Labs!

      The utility of the raw data is a function of citizen involvement. Releasing data is table stakes for reform, if you like. It’s up to the public to make that data useful, use it for advocacy, start a business, etc. If a “blindly dump” of data doesn’t feel useful, that’s a referendum on the public’s response, not the government.

  4. Hi Gunnar,

    For what it’s worth, I think Sunlight’s figure of $1.3 trillion is overblown and misleading. (Sunlight does get very high marks for their openness, which makes it relatively easy to see how the number is trumped up.)

    I dug a bit into their ClearSpending subsite, and here’s what I noticed. If I’m misunderstanding the picture, I hope they’ll explain.

    The lion’s share of the $1.3 trillion is in the category “incomplete.” Let’s start with the biggest single “misreported” amount: $718 billion of the Social Security Administration budget that Sunlight classifies as “incomplete.” All of this $718 billion is classified as incomplete because of failures in the following fields: Recipient County Name, Recipient City Code, Recipient City Name, Principal State of Grant, Action Type, and Federal Award ID. I assume this means that the money is reported, but those fields are blank.

    At first blush, this might appear to be a serious problem — and it would be if we didn’t know where hundreds of billions of dollars were spent, but note that there are no “failures” for Recipient Name, Recipient State, Recipient County Code, or Recipient Congressional District. So while missing data in general is a potential problem (and in any case, attempts to aggregate on frequently-missing fields will produce garbage), it’s a relatively minor problem in practice when the data that’s missing is redundant (e.g., we always get the county name, but we never get the county code). This kind of problem is very easy to mitigate. Delete the columns from the public dataset, or point out in the data dictionary that multiple forms of location information might not be provided with each item. Or fill in the missing data (potentially time-consuming if what you have is names and they may be misspelled).

    [For the record, I couldn’t follow up with the raw data itself. Maddeningly, all, yes all, of the links I tried at were broken, so I couldn’t see the data or the data dictionary to guess if these fields were truly redundant and whether there were helpful remarks in the data dictionary. It could be that the missing Action and Federal Award ID values within the SSA budget are a big deal, but I have my doubts.]

    The second biggest number is within DHS: over $400 million. Again, in the incomplete category, and again, relating only to these exact same fields.

    The next biggest numbers are under HHS: $400 billion is “incomplete” because of the same fields as before (and one additional one: Principal Place of Grant Code, but all but 0.002% of the money is assigned a “Principal State of Grant” value, so it’s redundancy again); and — and this is the only 12-digit number that appeared to me to be included in the $1.3 trillion for any reason other than missing values in a few redundant fields — $500 billion, all due to Medicare, in the “not reported” column.

    While $500 billion isn’t $1.3 trillion, it’s a lot, but… Fully 100% of these lines have been “non reported” in 2007, 2008, and 2009, so I can only guess that the reporting requirements are different for Medicare than for other sectors of the budget. I sincerely doubt that 100% of the Medicare budget is being lost and never accounted for.

    Once these large and (I suspect) relatively unimportant flaws in the raw data are discounted, there’s not a $1.3 trillion dollar problem. Any problem is maybe a tenth that big. I can’t be more specific, because while Sunlight’s web site is pretty detailed, I didn’t find any explanation of what the $1.3 trillion comprised. No obvious combination of numbers on the web site added up to that.

    All in all, while the problems Sunlight found are large from a numerical point of view, they are minor from the viewpoint of whether the data is right or wrong, good or bad, misleading or accurate, garbage or gold, and whatever badness there is is not in a trillion dollars.

    By my quick analysis, the budget information at isn’t anywhere near 70% wrong (which is what I think Sunlight intimates, intentionally or not).

    Also worth noting: a lot of money is in the “Late” column (“reported over 45 days after obligation,” Sunlight says). That’s crummy, but it’s a mistake to categorize data as permanently flawed if it’s there, it’s correct, and it simply failed to meet a deadline imposed by an optimistic law. Sure, no one should be misled into thinking the data is complete at a certain date if it likely isn’t, but again, this is a problem that can be mitigated. Some data may be routinely late because of federal, state, and local budget cycles, for example.

    1. Hi Steve — thank you for engaging with our analysis in this level of detail! Having people dig into our work in the way you have is exactly what we were hoping would happen. I’ll respond here, though if you’d like to move this conversation to one of our announcement posts so that more people can find it, I’d be happy to do so. Also, apologies for the delay in my response — I needed a bit of time to get everything together.

      You’re right to single out Medicare/Medicaid as outliers. The situation here is a bit complicated: those programs don’t report individual transactions to FAADS-PLUS/USASpending. Indeed, payments to individuals do not have to be reported under FFATA, presumably because of privacy concerns. However, in the past these programs *have* reported block payments aggregated at the county level to USASpending (and to FAADS, the separate system maintained by Census, where I believe they continue to be required to report). My understanding is that the reporting stopped because of a technical disagreement between HHS and OMB.

      So this is money that doesn’t legally *have* to be reported — however, we know that it *can* be reported (it has been in the past). And we think it *should* be reported. As I’m sure you know, healthcare is a huge and growing portion of the federal budget. I don’t think USASpending can be considered a useful picture of how the government spends money if health spending is excluded.

      The situation is even *further* complicated by the fact that OMB seems to have resolved its disagreement with HHS and added the data in the past few weeks, after we received our snapshot of the data. Unfortunately, exporting data from the site remains difficult, though we have reason to believe the situation will improve soon. We certainly intend to re-run our analysis once the FY 2010 data is in.

      Let me respond to your concerns about the prominence of the timeliness and completeness metrics. We think there are important components of making the data useful, but understand that others may disagree. However, they contribute a relatively small amount to the $1.3T figure, once you make sure that you’re not double-counting any rows. I’ve asked the lead analyst on this project, Kaitlin Lee, to provide some more detailed breakouts explaining how we arrived at the $1.3T figure. You can find CSVs with this data on our resources page. There are a few things that I hope you will note. First, we feel that we’ve been quite generous to the government in our analysis. We only count reporting as being inconsistent if the CFDA and USASpending numbers disagree by more than fifty percent. We also extend a 50% cushion beyond the statutory requirement before we count a record as being late. We also exclude loan reporting from the $1.3T figure, because the numbers there are so large (and suffer from so many errors). And of course this number doesn’t capture programs that aren’t reported to either the CFDA or USASpending. Second, the $1.3T figure is arrived at by evaluating things at a program level, not an agency level. Because of the thresholds we use for determining compliance, summing things at one level doesn’t give you the same amount of “bad reporting” as you’ll get at a different level.

      We feel that the $1.3T figure was arrived at fairly. But we also don’t want anyone to get too hung up on it. It’s been an important tool for drawing media attention to this problem, but we feel that our granular, program-level analysis is much more valuable, both because it reveals specific classes of data problems, and because it can help identify agencies that are managing their reporting properly. We think this analysis can be a real tool for government — and that there are other things we can offer along those lines. Stay tuned for more on that.

      One final note: all of our code is on Github (you can find that link on the resources page, too). I’m afraid the docs are a bit thin right now, but we’d be more than happy to go through our work with you if you’re interested in diving in further.

      Tom Lee
      Director, Sunlight Labs

  5. This seems appropriate:

    “As time passes [politicians in government] realise they have more to lose than to gain from public knowledge of what they are up to. Each month increases their tally of catastrophic misjudgements, pathetic deceptions, humiliating retreats and squalid compromises. They very soon come to understand that sound and effective government is only possible if people do not know what you are doing.”

    (Just in case people think otherwise… that is satire)


Comments are closed.