The Accumulo Challenge, Part I

The dozens of software projects launched in the wake of Google’s Big Table and Map Reduce papers have changed the way we handle large datasets. Like many organizations, the NSA began experimenting with these “big data” tools and realized that the open source implementations available at the time were not addressing some of their particular needs. They decided to embark on their own project: Accumulo. Once they were happy with how Accumulo was working, they did the right thing and released Accumulo to the world through the Apache Foundation.

This is great for open source and the taxpayer. The government found a requirement not being fulfilled by the private sector, and rather than letting their work languish inside its walls, or paying a contractor to develop a proprietary solution, they shared what they had with the world. This is what open source advocates have been clamoring for, and with the Shared First initiative at OMB and innovative open source policies like those at the Consumer Financial Protection Bureau, it’s certain we’ll see more of this kind of sharing of taxpayer-funded work.

There’s a catch, though. Since it was launched, Accumulo has joined an increasingly crowded space for tools that manage big data. As these upstarts compete – witness the extraordinary success of MongoDB from 10genHadoop implementations from Cloudera and HortonWorks, tools like HadaptHBaseCassandraMapR, and many, many others – Accumulo presents a threat. Some of these products and projects already compete with each other, and the private sector doesn’t like it when the government competes alongside them, for good reason.

There’s a frequently-ignored policy called OMB Circular A-130 which says that the government shouldn’t build something already available from the private sector. In the language of the policy, the government should:

…acquire off-the-shelf software from commercial sources, unless the cost effectiveness of developing custom software is clear and has been documented through pilot projects or prototypes

So there’s a tension here: we want the government to share its innovations, but we don’t want the government to crowd out the private sector. That’s the thinking behind this recent language in Section 929 of S.3254, the 2013 Defense Authorization as reported out by the Senate Armed Services Committee:

(a) Limitation on Use of NSA Database-

(1) LIMITATION- No component of the Department of Defense may utilize the cloud computing database developed by the National Security Agency (NSA) called Accumulo after September 30, 2013, unless the Chief Information Officer of the Department of Defense certifies one of the following:

(A) That there are no viable commercial open source databases with extensive industry support (such as the Apache Foundation HBase and Cassandra databases) that have security features comparable to the Accumulo database that are considered essential by the Chief Information Officer for purposes of the certification under this paragraph.

(B) That the Accumulo database has become a successful Apache Foundation open source database with adequate industry support and diversification, based on criteria to be established by the Chief Information Officer for purposes of the certification under this paragraph and submitted to the appropriate committees of Congress not later than January 1, 2013.

(2) CONSTRUCTION- The limitation in paragraph (1) shall not apply to the National Security Agency.

(b) Adaptation of Accumulo Security Features to HBase Database- The Director of the National Security Agency shall take appropriate actions to ensure that companies and organizations developing and supporting open source and commercial open source versions of the Apache Foundation HBase and Cassandra databases, or similar systems, receive technical assistance from government and contractor developers of software code for the Accumulo database to enable adaptation and integration of the security features of the Accumulo database.

First of all: Wow. The Senate Armed Services Committee proposes to order the DOD to stop using Accumulo, and direct NSA to help push Accumulo’s code back to other projects, specifically calling out HBase and Cassandra. The level of sophistication required for legislative language like this is astonishing. Under different circumstances, I’d find that sophistication encouraging. Instead, I’m concerned that SASC feels compelled to blacklist an open source project for the DOD. Surely there’s a better response than this?

Let’s put the Committee’s reasoning to the side for a moment, and look at the remedy they proposed. What if it wasn’t Accumulo, but another piece of software? Imagine that we’re talking about the Apache Web Server, Red Hat Enterprise Linux, Microsoft SharePoint, or Adobe Acrobat. If Congress put any of those software packages on a blacklist, industry would lose its mind. Accumulo is no different: once it was open sourced, Accumulo became commercial software under the FAR and DFAR. Congress has no business intervening in this way.

There’s more. The requirement that Accumulo be certified as “a successful Apache Foundation open source database with adequate industry support and diversification, based on criteria to be established by the Chief Information Officer” is extremely dangerous for Accumulo and for open source in general. It’s not sufficient that the software be commercial, functional and be available at reasonable cost. It must now have “adequate industry support and diversification.”

If the DOD CIO is compelled to create such criteria for Accumulo, it doesn’t take much imagination to see that same “adequacy criteria” applied to all open source software projects. Got a favorite open source project on your DOD program, but no commercial vendor? Inadequate. Only one vendor for the package? Lacks diversity. Proprietary software doesn’t have a burden like this.

The last clause of Section 929 is bewildering. SASC directs Accumulo to help other projects that want to use the Accumulo security code, and singles out HBase and Cassandra. There’s nothing wrong with the desire to spread Accumulo’s technology, but doesn’t an act of Congress seem like an extraordinarily, comically inappropriate tool for that? Wouldn’t the Accumulo team, like all open source developers, be generally helpful with folks who want to integrate their code? Perhaps more importantly, why is Congress so interested in HBase and Cassandra?

I think the Committee (and whoever provided them this legislative language) is right to be concerned about the government unnecessarily maintaining a duplicative software project. It’s bad for the private sector, and it’s bad for the government to maintain its own codebase when there are perfectly good alternatives elsewhere.

At the same time, the Accumulo folks indisputably did the right thing by releasing their code, and even went so far as to join the Apache Foundation, which is no small effort. They should be rewarded for their excellent stewardship of taxpayer money. Through their effort, everyone can already benefit from the work that they’ve done – with or without legislative orders to do so – and they’re perfectly capable of winning or losing market share on their own merits, just like everyone else.

This Accumulo issue opens the door to a host of valid questions about the role of government in open source projects. In part two, we’ll put this specific bill aside and ask the questions behind the legislation: does the government harm the private sector when they release open source projects? How can we know when open source is an appropriate way for the government to develop software? How should the government handle forks? I’ll also examine some possible remedies that could eliminate the need for a dangerously crude tool like Section 929.

In the meantime, please let your Senator know how you feel about this.

[Updated 18 June 2012 17:37ET to soften some of the more alarmist language and more clearly distinguish Congress from SASC.]

[Updated 19 June 2012 09:37ET to add some clarity, fix some grammar, and generally prepare for posting on]

9 thoughts on “The Accumulo Challenge, Part I

  1. I think it’s noticeably anticompetitive for the SASC to tell DoD not to use a specific piece of COTS (i.e. Apache Accumulo) because *historically* it was GOTS.

    I mean, seriously? They’re gonna pass a law telling us to use Apache HBase instead of Apache Accumulo?!  Isn’t that a little down in the weeds for the SASC? 

    I’m a somewhat sympathetic to the SASC message… it might have been better to collaborate with HBase than to reinvent it as Accumulo… but that ship has *sailed*.  HBase didn’t exist at the time.  And, and, and … whatever.

    NSA folks are in a tizzy over this.  I expect both DoD CIO and NSA will reclama.  It’s bad enough when they tell us to suck eggs – we really don’t like it when they tell us *how* to suck the egg.

    The recommended language could also have some negative unintended consequences.  The next cool software developed by NGA or NASA or IRS might not get released as open source for fear that some well-meaning congressional staffer might second-guess their decisions and make them look bad. 

    Think on this:  this language is actively forbidding software reuse.


  2. I can understand congress trying to reduce duplication of effort or preventing government from competing with industry.  NSA’s security enhancements of open source software is consistent with federal policy and good for both government and industry.  The potential downside is when a SI performs a free download, modifies the code for a specific project, wraps it in a proprietary shell, and builds yet another static stove pipe.   

    The IT Acquisition Advisory Council has developed a business case analysis and risk management framework that evaluates these risks and lifecycle cost.  The cost of an open source or COTS maintenance license is less than 5% of the lifecycle cost of any major IT project.   If Congress and DOD want to preserve innovation and US leadership in IT, it must not enter into contracts with big SIs that undermine this market.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s