The Apache Foundation hosts the Apache Accumulo project, which is a data storage and retrieval system for big data created by the NSA in 2008 and submitted to Apache in 2011. Derrick Harris at GigaOm describes Accumulo as “The technological linchpin to everything the NSA is doing from a data-analysis perspective”; it is probably part of the BoundlessInformant open source stack (see this presentation [PDF]) that stores and analyzes the Verizon FISA data.

The Apache Foundation ”provides support for the Apache community of open-source software projects, which provide software products for the public good.” It looks to me like Accumulo is outside that mandate.

The Apache Foundation may, because of its membership, be more open to pressure than other organizations involved in the NSA’s big data effort. Are there grounds for a campaign to pressure Apache into removing Accumulo from its list of projects?

There may also be questions about more general-purpose projects that complement Accumulo, like Apache Hadoop, Apache Zookeeper, and Apache Thrift, but these were not designed so specifically for the NSA’s data handling needs as Accumulo.

Meanwhile, of course,

Update June 15: Follow-up post, yes it should.

  1. I’ve heard rumors that Tor (the onion router) was created by the CIA. Does that mean we should not use it? The Internet itself originated in the dark corridors of DARPA. Perhaps it is all just one big trojan horse upon society.

    BTW, right above the textarea in which I am presently typing is the following claim:

    “Your email is never shared.Required fields are marked *”

    Good luck delivering on that one…

  2. Lori – that’s a bit slippery slope isn’t it? I think there’s a difference to the Internet, although I’ve never been clear on the Tor/military links. Accumulo was developed with a particular purpose in mind, and it seems like Apache ceasing collaboration may possibly hinder that purpose, which sounds like a pragmatic reason to lobby them to me.

    Thanks for the note on the “never shared”! I’ll edit the template to “to the best of my knowledge” this evening.

  3. Accumulo, like HBase, is an implementation of Google BigTable on top of Hadoop. Compared to HBase, Accumulo has finer grained permissions.

    I don’t see anything particularly evil about database software.

    • I don’t see any­thing par­tic­u­larly evil about data­base software.

      Me neither :). But the issue is not whether fine grained permissions are a problem, it’s whether we are interested in putting spokes in the wheels of the NSA efforts to collect and analyze the Verizon data (and possibly the PRISM stuff, though that seems less likely to be a big problem imho). This seems like a practical way of doing so.

  4. I think it is wrong to put spokes in the wheels of the NSA efforts to collect call data records. The issue, in my opinion, is with the law (maybe the interpretation) and not with the technology nor the technicians.

    Having said that, I don’t think delisting Accumulo would have the effect you are after.

    My point about fine grained permissions is that Accumulo is almost identical in functionality to the much more popular Apache HBase. It is not purpose built for spying, it is a general purpose distributed database like Big Table, HBase, and Cassandra. I don’t know why anyone would choose it over HBase except for the enhanced permissions capabilities.

    I agree with you about PRISM. I read the four leaked PowerPoint slides and it seems to be about targeted FISA warrants. The Post article makes much broader claims about the scope of the program and I’m skeptical of their interpretation.

    The Verizon warrant is scary and I don’t see why we would expect to have a right to privacy for the non-encrypted parts of TCP packet data when all call data records are fair game.

  5. RAD: good points, as always, but I think you are wrong on “It is not pur­pose built for spy­ing”. I think it’s pretty clear that’s exactly what the intent of the NSA was when it built it. Of course it could be used for other things, but the application in mind when they built it was spying. Obviously the permissions were so that they could spy within the law, or at least within data access rules.

  6. Tom, my point is that Accumulo is no more specialized for spy craft than any other BigTable inspired database. BigTable was built by Google, HBase was built by Powerset, HyperTable was built by Zvents, Cassandra was built by Facebook, and Accumulo was built by the NSA. Each organization had certain use cases in mind but all the listed databases are more or less equally suited to the different use cases. In my mind, general purpose does not equal purpose built.

  7. The NSA built a special-purpose database with fine-grained permissions following the BigTable model. They then turned around and open-sourced it. What the technology was built for is immaterial, if it wasn’t Accumulo it would have been something else. If anything, open-sourcing Accumulo is one of the few positives to come out of this project. Anyone can use Accumulo — I’d call that public good. If Apache de-listed it, then it’d simply live elsewhere. It’s not as if Apache removing it suddenly shuts down the NSA’s databases.

    Removing Accumulo from Apache is trying to strike back through all the wrong channels and feels very reactionary.

  8. Michael: The argument that “if Apache de-listed it, it would live elsewhere” is the same argument used by arms suppliers, by companies selling deep packet inspection technology to authoritarian regimes, and so on. Would you apply your argument to defend Blue Coat, McAfee, or Netsweeper?

    I think there is a lot of pre-judging of “good guys” and “bad guys” going on that is inconsistent at least. Accumulo is as much surveillance software as deep packet inspection software, and I’d apply the same arguments to both.

  9. Let’s say the NSA invents a new lightbulb, the better to enable them to work around-the-clock at spying on everyone, and releases it as an open source hardware design. Is it morally wrong to use that design? To improve upon it? To share those improvements under a license that would allow the NSA to use your improvements?

    The comparison to arms companies doesn’t really hold, in my opinion. Guns (or weapons worse than guns) have a clearly-defined and almost exclusively harmful purpose. It is reasonable to say “we do not condone the use, design or manufacture of weapons” in a way that it’s not reasonable to say “we do not condone the use, design of manufacture of database management systems”. The scope for harmful use is much smaller and, more significantly, the scope for beneficial use is much greater in the case of Accumulo than it would be in the case of, say, the designs for a new drone or missile launcher or nerve gas agent.

    That said, I see your point. I’d support Apache de-listing Accumulo simply as a way of disassociating themselves with the NSA, in order to maintain clarity about the Apache foundation’s social aims. I just don’t think that this should stop anyone from using Accumulo or forking/patching/sharing improvements to it.

  10. Rob – Thanks for the thoughtful comment. Accumulo is, like deep packet inspection technologies, somewhere between a lightbulb and a gun. It was developed with particular searching capabilities in mind (see the slide show here.) So I think the comparison to the vendors who supply filtering or deep packet inspection technologies to authoritarian states is reasonably close. I tried reading the EFF report from a couple of years ago called West Censoring East with this in mind and the parallels seems strong to me.

    So I agree with you about Accumulo not being a gun. And I am not going to go after Donald Knuth for TeX because the NSA researchers used Beamer to do their presentation. But in between, I think it’s pretty clear that Apache is providing a valuable service to the NSA and that, given what we now know about how the NSA is using Accumulo and Apache’s commitment to the public good, asking them to stop is reasonable.

  11. This all reminds me of the “kill or maim” clauses that some people in academia were asking the universities to include in their research policies back when I was in college in the antedeluvian 1980′s. The even more radical professors wanted a moratorium on “classified or proprietary” research. That was before “gift economy” became a major catch phrase. Needless to say, the Very Serious People pointed out how utterly inconsistent such principles were with the university’s “business model” (although that catch phrase was also to catch on in subsequent years).

    Here’s a question: Could the Apache Foundation exist independently of the military industrial complex? Some say it’s something we shouldn’t worry about:

    Change in stated privacy policy noted.

  12. Nothing you or anyone in the world will ever do will stop them

  13. Lori: I call your antediluvianism and raise it a decade.

