Yes, the Apache Foundation Should Dump Accumulo

This post follows on from my previous one, which has the background and links. In brief, the Apache Foundation is hosting the Accumulo project. Accumulo is software created by the NSA and handed to Apache, and it is at the heart of the NSA’s surveillance technology stack. Now that we know about the use of the technology, Apache has the opportunity to distance itself from the NSA surveillance scandal, and should do so.

How should we think about the role of Apache in the NSA surveillance scandal? Perhaps a good place to look is the work of respected open internet advocates like the OpenNet Initiative. So let’s do that.

A couple of years ago Helmin Noman and Jillian York of the OpenNet Initiative published a bulletin called West Censoring East: The Use of Western Technologies by Middle East Censors, 2010-2011. The bulletin documented network filtering of the internet by national governments, and “the use of American- and Canadian-made software for the purpose of government-level filtering in the Middle East and North Africa”. The goal of the report was to inform a “genuine discussion of the ethics and practice of providing national censorship technology and services”. Just to be clear, and for what little it is worth, the report seems admirable to me. The ethical stances it takes were reiterated by Rebecca MacKinnon when she wrote about it last year in her influential book “The Consent of the Networked”. What’s interesting now is to read the report, read the ethical stances it takes regarding the provision of services by Western companies to authoritarian actions by national governments, and apply those lessons to Apache and the NSA. The parallels are, I hope, obvious.

The report concludes that “Western companies are playing a role in the national politics of many countries around the world. By making their software available to the regimes, they are potentially taking sides against citizens and activists who are prevented from accessing and disseminating content thanks in part to filtering software.” The authors complain that “companies appear to have done little to curb the use of their tools–if not offering them outright for that purpose–for government-level censorship. These companies seem not to have adopted policies and procedures to safeguard freedom of expression in the event that states rather than parents and schools use their tools, as their products are being openly used by several state-run ISPs to limit what citizens can and cannot access online.” The final sentence states that “Such companies must recognize the role their tools play in the international landscape and set forth policies that protect Internet users’ right to free expression–or at least put them on record about the role that they play.”

The technologies that the companies are providing are general purpose technologies: almost everyone would agree that internet filtering technologies have valid uses by parents and schools, for example. It’s not the technology itself that is offensive, at least to anyone who is not happy with the idea of kindergarten kids stumbling across violent pornographic images. It’s the relationship between the companies and their customers: the companies are providing a service, knowing the use to which it is going to be put. The report expects companies to think about the use of their tools and to take action to prevent them being used in ways that curb freedoms. It expects companies to limit the use of their tools.

The role of Apache as the host of the NSA-initiated Accumulo project is directly parallel to the role of western companies providing filtering software that is used by authoritarian regimes to curb freedom of speech. So, in the light of the OpenNet report, how would the continued hosting of Accumulo look?

  • Is Apache providing a service to the NSA? Yes it is. Some people have been telling me that it’s not, or that it is but it’s unimportant. Both of which seem positively bizarre to me. The NSA took a deliberate decision, after developing Accumulo, that the best way forward was to open source it and look to a private vendor (sqrrl) to continue to provide a distribution that matches their needs. Apache is instrumental in carrying forward that plan.
  • The NSA could get their software some other way. This is irrelevant. The OpenNet Report does not let McAfee off the hook because Symantec provides a similar service, and we should not let Apache off the hook either.
  • The Accumulo software is general purpose: does that matter? No it doesn’t matter. First, it’s not that general purpose: it’s not like lightbulbs, it is general purpose data collection and data analysis software in the middle of a controversy over data collection and data analysis, and it’s general purpose for anyone who has a data centre and a few petabytes of data to process, and who requires detailed access controls over who can see that. That’s not very general. Second, Apache now knows the uses to which the software is being put, just like the companies providing software to the governments of the Middle East knew how their software was being used once OpenNet reported on it.
  • Why go after Apache, when they are one of the good guys? Because their declared mandate and their broad membership makes it more likely that they will take a stand. It’s not “going after Apache”, it’s getting Apache to do the right thing. It won’t stop the NSA but it limits the breadth of collaboration. I don’t particularly think of Apache as “one of the good guys” because the whole good guys/bad guys way of thinking seems to lead naturally to double standards, but I’m not out to get them, I just hope they do the right thing now they see how their efforts are being used.

Especially for people outside the USA, putting pressure on an international organization seems a useful way to go. If anyone is interested in taking this up, maybe we can put together a petition at least. Please contact me in the comments if you are interested.

Created: 2013-06-15 Sat 14:44

Emacs 24.3.1 (Org mode 8.0.3)

Bookmark the permalink.


  1. Accu­mulo is general purpose. Its no more specialized than HBase, Cassandra or any other of the BigTable variants.

    Delisting Accumulo would only hurt the people who have decided to use it after it was open sourced. The NSA built most of the base and as such will still have access to it. And lastly at the end of the day, Cassandra, HBase and HyperTable have come a lot farther from the time where Accumulo was released.

    I would argue today it would be easier to use Apache HBase or Apache Cassandra to do big data surveillance. Does that mean Apache should sunset those technologies too?

    The NSA also uses Linux as well. Should the Linux foundation stop developing Linux?

  2. Nemo: Have Cassandra/HBase/Hypertable or others implemented the cell-based security that was the key driver for Accumulo? (genuine question).

    Switching complex software stacks at large scale is never an easy job.

    As for Linux: I’m no fan of slippery slope arguments because… Hitler!

  3. Nemo. It’s not exactly unbiased but sqrrl are claiming that Accumulo has two benefits over Cassandra, HBase, and MongoDB: security and scalability. I have no idea how they measure the scale or what they base their claim on, but the reference is here. If you have other information I’d be interested.

  4. I’m part of the Pirate Party, which is all about digital liberties. Part of this is maintaining privacy online, although that’s not actually our primary policy.

    The primary policy is about copyright and patent reform. Alongside other things, one of the “prongs” of this reform is that it should be legal to disable DRM, and that it should be legal to reverse engineer it and tell others how it was done. There are similar policies in place for peer to peer file sharing technologies like BitTorrent. The reasoning here is that there are non-infringing use cases for both, and just because BitTorrent can be used by pirates, and breaking DRM would mean that piracy could occur on that device, this is unimportant compared to the fact that non-infringing users would not be punished.

    The principle here is that you cannot tar a technology with a brush of its users. You cannot say “pirates use file sharing, therfore file sharing should be banned”. I’m unsure how your argument is any different to “the NSA uses Accumulo, therefore Accumulo should stop being supported by Apache”. It’s an unfortunate fact that piracy happens with these technologies, and it is perhaps most unfortunate for the legitimate users of the technology, but it is wrong to take action against the technology itself. We need to find another solution against the NSA, a better solution, than removing Accumulo from the Apache project.

  5. Very much agreed

    (small note: The report is not an EFF report; I work there now, but did not at the time)

  6. Regarding security and scale. I don’t think cell level security was a driver for Accumulo but it is certainly a differentiator compared to HBase. If the NSA used HBase/Cassandra, they would need to add a permissions layer between the client and HBase/Cassandra much the way GitHub adds a permission layer to Git. The GitHub permissions use-case is different than the Linus Torvalds permission use-case and I would hesitate to read anything beyond that.

    I think the scale claim comes from Accumulo differentiator #2 which I understand is the ability to run user-defined in-process iterator functions. All major enterprise analytic databases have a framework for user-defined aggregate/table functions that run in-process (or in a local sandboxed VM). I’m assuming that the NSA uses this functionality to implement graph analytics.

    I wish I had made the point as succinctly as Sunny Kalsi did:

    “The prin­ci­ple here is that you can­not tar a tech­nol­ogy with a brush of its users.”

    Here is my opinion on your four bullet points, Tom:

    1. Is Apache pro­vid­ing a ser­vice to the NSA?
    It is from a marketing perspective (i.e. engaging the Hadoop community) but not from a technology perspective. The NSA created the software, they were using it before it was submitted to the Apache Foundation and they will continue to use it regardless of the status of Accumulo as an Apache project.

    2. The NSA could get their soft­ware some other way.
    I don’t think building and deploying the software distribution is important to the creators/contributors of the software. Software distributions are a benefit to the non-contributing users.

    3. The Accu­mulo soft­ware is gen­eral pur­pose: does that mat­ter?
    Accumulo is as general purpose as a lightbulb. It is as general purpose as a SQL database and far more general-purpose than filtering Internet proxies. Its not just the general purpose aspect, its the set of likely use-cases and whether or not those use-cases are “evil”. Despite the child filter use-case, web filters deployed by anyone that is not a parent are almost all “evil” (from a libertarian perspective).

    4. Getting Apache to do the right thing.
    I remember when Sunday Shopping was illegal in Ontario. Your reasoning seems similar to the no-shopping-on-Sunday advocates who could not see beyond the happy idea of families spending more time together (your happy idea is punishing and/or shaming the NSA). Yes spending more time with family is a good thing but shopping is mostly independent of that goodness.

  7. Regarding the new Disclaimer: “We never knowingly share your email. The NSA probably doesn’t get it, but who knows?”

    If you submit a comment in a web form without SSL, do you have a reasonable expectation of privacy?

    I don’t know the answer.

  8. Sunny – I guess that’s why I’m not in the Pirate Party. Making a universal principle out of a particular case is almost always a bad idea and this seems to me no different.

    Jillian – Correction made. I’m going to assume you were agreeing with me, not Sunny 🙂

    RAD – More later on your comments, I hope, but for now it’s breakfast. And I don’t know about the expectation of privacy either.

  9. Clearly stopping Apache taking donated code from the NSA is just throwing toys out of the pram. They have made their contribution, you can’t return the code to them and it is too valuable to destroy (even if you could).

    Accumulo isn’t the sharp edge of the sword, it is just part of the metal.

    • Bob H: I agree that the primary focus of campaigns at the moment is, and should be, on the legal side of things – which I think is what you are saying in your second paragraph.

      But like any other project, Accumulo is not a finished work. What will it be in five years? In ten? Apache and its contributing partners will continue to assist in the further development of this technology. Hosting the project involves far more than being github for the NSA.

  10. Just reading this now. If you have any further info about this campaign, if there is a petition or one being worked on, etc. please send it to me.

    Most of the comments on this thread seem to miss the point. The point as I see it is not to kill an open-source tool but to get organizations like Apache to take a strong stand with us against the NSA, to expose the partnerships between companies and NGOs like Apache with national security apparatuses, and to call on them to break ties. It’s part of many things we should be doing to throw up political roadblocks to their continuing to spy on us. It has nothing to do with taking away their tools, which would be futile.


    • I didn’t get any response at the time so didn’t pursue it. Most of the effort seemed to be over the Government agencies themselves, which seems fair to me. But I’m sure this will come up again.

  11. Sorry guys, plenty of other (US and non) Government agencies use Accumulo. There is a need for security in these areas of information gathering. At no point is this likely to change.

    For those that don’t need the security, they will likely go to easier to manage products, but this has a solid niche use case. So claiming that it has validity as a product is a solid claim.

    I understand and concur that the crap the NSA pulled is BS and should be cracked down on, but if you do that by removing a NoSQL store with security baked in, you basically cripple other types of government that you would never think of.

    (The first one that comes to mind is the Police force – yes, they currently use this as well and keep security to protect victims and whistleblowers)

Comments are closed