Wrapping up: personal notes on the Airbnb report

Looking back at the Airbnb report that Murray Cox and I did earlier this month, now that Airbnb admits we were right

It was the beginning of December when Airbnb released data about its listings in New York City,  but it was early January before I had time to properly compare my own data to theirs. Pretty soon, I noticed an anomaly: what looked like a significant jump in the proportion of single-listing hosts between two surveys I had carried out, either side of the November 17 date Airbnb chose as a seemingly-random snapshot of their business. Murray Cox regularly collects data on Airbnb listings too at Inside Airbnb, so we talked about it. He looked at his data and thought he saw the same. So that’s interesting, we thought…

Then it was several weeks of questioning. We looked again and again at data reliability, we checked and cross-checked calculations, and we mulled over alternative explanations.  There is good reason to expect Airbnb to be misleading: there is $25 Billion in fortunes to be made through a good IPO, and that’s a lot of pressure to make things look better than they are. But incentives are one thing, and reality is sometimes another. Airbnb routinely calls our data sets inaccurate (see here and here, for example), and even though comparisons to Airbnb’s own selective data releases have made us fairly confident, I did worry that going big on a story may set us up for failure and embarrassment if we had missed something obvious.

First we asked ourselves: if these listings were actually removed, is this a story? Yes, we thought. If Airbnb removed listings immediately before making a “public data release” and never once mentioned doing so, it seemed like a clear case of using data to mislead, and given the high profile of the company’s original data dump (two New York Times stories, and many others) someone should be interested.

Can we prove that the changes are not just noise? I scan the Airbnb web site for NYC periodically, but a survey I did in early November had given bad results (for boring technical reasons) so the bracketing data sets I had were from early October and early December – a bit too wide for comfort. But Murray’s data has proven more reliable than mine, and (tipped off by Ariel Stulberg of The Real Deal) he had run an additional survey on November 20. His data confirmed what I’d seen and more — clearly there was something unusual that happened in early November.

Next: are our calculations correct? Comparing Airbnb data sets on different dates is tricky because there is always significant churn on the site — a lot of people put up a listing, get little interest or decide it’s not for them, and leave again. So I did my own calculations twice over (once in SQL, once using python pandas), and Murray did his own, and we compared, until we came to feel confident in them.

And what about alternative explanations? Running similar calculations on a number of other cities (thank you Jupyter notebooks) showed nothing of similar size. Even the previous known purges (like April 2014) did not show up quite so dramatically. Given the timing, we were pretty confident that the change was prompted by Airbnb removals, but who knows?

Finally: if Airbnb is removing “bad actors”, isn’t this a good thing? After talking it over with people I respect (thank you Lynne) we agreed that the big problem is one of trust. I managed to get the last word in the New York Times story on our report: Airbnb’s business model depends on changing regulations in cities around the world, and so cities need to be able to trust Airbnb. By hiding what they did, Airbnb is acting in an untrustworthy manner. Their “Community Compact” (actually a unilateral document, not a compact) commits Airbnb to co-operating with cities and being transparent: this action was neither co-operative nor transparent.

So several rounds of edits later, and after many attempts to format graphs so as to put the message in a clear fashion, we released our findings as a report, and it got more coverage than I could have expected. I admit I was nervous about Airbnb’s response: a bad calculation or an overlooked reasonable explanation could leave us looking pretty stupid.

In the first story (in the New York Daily News) “Airbnb dismissed the findings, saying through a spokesman that ‘listings come on and go off throughout the year. We routinely review our listings to ensure guests are having the quality, local experience they expect and deserve’.” The way I read that, Airbnb is saying the changes we saw are just regular churn in the business, and implicitly denying that the company did anything special in that time. The “spokesman” also said that “The report also focuses on the busy marathon and Halloween weekend, when listings spiked”; a claim the company repeated to The Guardian.

After a brief panic and a little reflection, it was clear to us that this explanation does not hold water. We hadn’t seen a spike in overall listings, we had seen a thousand more listings vanish from the platform than during other months, and almost exclusively these extra vanished listings were owned by hosts with multiple listings: there’s no reason for any special event to have such an effect.

As the day went on and coverage spread, Airbnb obviously got their act together and came up with a standard response. For example, here is what they sent to Fusion’s Kristen V. Brown:

“The facts are clear for all to see—the vast majority of our hosts are everyday people who have just one listing and share their space a few nights a month to help make ends meet. Airbnb is an open people-to-people platform where listings come on and go off throughout the year. We’ve also done significant work to educate our community about what is in the best interest of their city and we routinely review our listings to ensure guests are having the quality, local experience they expect and deserve.”

The company provided one other piece of updated information, which was to share “a partial snapshot with Fortune of its New York City listings as of Feb. 8 that shows that 94% of hosts there only have one listing.”

The paragraph is fluff, sprinkled with generalities. The number they give (94%) actually matches what Murray and I had found rather than countering it (Figure 1 in the report). This response again made me feel more confident that we were on the right track.

Still, I confess it was pretty sweet when, on Wednesday morning, Airbnb changed its tune. The company sent a letter and an FAQ to New York State representatives and posted the two documents on its site. I first heard of these documents from the above-mentioned Kristen Brown. Here is the key Q&A:

Did you remove listings from your community last fall?

Yes. We issued our Community Compact in November. Throughout November, and consistent with our Compact commitments, we removed roughly 1,500 of the 37,000+ Airbnb listings in New York City in an effort to remove listings that appeared to be controlled by commercial operators and did not reflect Airbnb’s vision for our community. 622 hosts were impacted, including 375 (60%) that had 2 or more listings removed.

The company has changed its tune, and conceded that it did indeed remove listings (and did not tell anyone about doing so). Fusion’s report had a great headline:

Airbnb admits that it purged 1,500 unflattering New York listings right before data release

The debate continues, of course. Working with Pat Clark of Bloomberg, Murray Cox has shown that some of the ejected listings have already started returning. The question of whether Airbnb’s own “vision for the community” is a sufficiently precise and legitimate criterion for removal will be debated (hint: No). But for now, I’m happy to celebrate being right.

How Airbnb’s data hid the facts in New York City

A report by Murray Cox and Tom Slee

On December 1 2015, Airbnb made data available about its business in New York City, with much fanfare. A new report by Murray Cox and me shows that the Airbnb data release misled the media and the public.
front-page-graph

Airbnb’s data release was presented as “the first time Airbnb has voluntarily shared city data on a wide scale on how its hosts use the online platform”. This report shows that the data was photoshopped: Airbnb ensured it would paint a flattering picture by carrying out a one-time targeted purge of over 1,000 listings in the first three weeks of November. The company then presented November 17 as a typical day in the company’s operations and mis-represented the one-time purge as a historical trend.

Key facts

  • Airbnb purged over 1,000 “Entire Home” listings from its site just days before it prepared a data snapshot of its business.
  • Airbnb used the data snapshot to paint a misleading picture of its business:
    • Airbnb’s message was that only 10% of Entire Homes listings belonged to hosts with multiple listings. The true number had been close to 19% for all of 2015.
    • Airbnb’s message was that “95% of our entire home hosts share only one listing”. The claim was true for less than two weeks of the year.
    • Airbnb’s rosy projections about the future of its business were not objective analyses based on historical trends. The company extrapolated from an artificial and unrepresentative one-time event.
  • Airbnb’s one-time purge was a PR effort, and does not indicate a change of heart for the company:
    • No similar event took place in other cities in North America or elsewhere.
    • Contrary to Airbnb projections, levels of multiple-listing entire homes have already jumped back to 13% of the total, only two months after the purge.
    • Despite claiming that it wants to “work with cities”, Airbnb carried out its purge without disclosure or consultation. Airbnb did not kick illegal hosts off the site; many commercial hosts still have listings on the site, but the purge made them appear, briefly, to be single-listing hosts.

The report

Download the full report: how-airbnbs-data-hid-the-facts-in-new-york-city.pdf.

The data

Download the full TS data set as a set of CSV files: 2014, 2015, 2016.

Download the full MC data set from Inside Airbnb.

For the press

Full press release: press-release-how-airbnbs-data-hid-the-facts-in-new-york-city.pdf.

Contact details

Airbnb’s business in New York City

According to the New York Times, Airbnb yesterday “released” data about their business in New York City. As I first reported on Airbnb in New York two years ago, when that business was a lot smaller, I was interested. Airbnb’s Chris Lehane says “Our hope is that people will understand 99 percent of people on Airbnb in New York City are using it as an economic lifeline,” and who could object to that? Would the real numbers show that we critics are wrong?

My work has been based on scrapes of the Airbnb web site (now done better by Murray Cox at Inside Airbnb), so it’s necessarily less accurate than Airbnb’s own data. On the other hand, I don’t have a $25 billion market valuation at stake in the answer, so it may be easier for me to be honest in my reporting.

I hoped that “released” meant that I could get the data, but I was quickly disillusioned. It turns out to mean “made available only by making an appointment to visit Airbnb’s New York City office”, which is a bit of a joke. Instead, all we get is the summary statements from Airbnb PR. Still, it is better than nothing. So I read on…

My first response to the New York Times article was dismay at this statement: “From November 2014 until November 2015, some 93 percent of revenue earned by active hosts in New York City who share their entire home came from people who have only one or two rental listings on the platform”. That is a number far higher than I had seen, and suggests that a much bigger portion of the Airbnb business is their archetypal “regular New Yorkers occasionally renting out the home in which they live” than I had thought. I had reported about 40% of Airbnb’s business coming from people with more than one rental listing and the numbers suggested 20% of business coming from people with more than two listings. Have I and other critics been getting it wrong? In the absence of complete data we have to make some estimates about income after all. This would be unfortunate as I have just PUBLISHED A BOOK ABOUT THE SHARING ECONOMY THAT MAKES A GREAT CHRISTMAS PRESENT and that is critical of Airbnb.

But today the New York Times ran a correction: “From November 2014 until November 2015, some 75 percent of revenue earned by active hosts in New York City who share their entire home came from people who have only one or two rental listings on the platform.”  (my emphasis). The change from 93% to 75% is significant: that’s almost four-fold increase in the proportion that comes from three-or-more listers. All of a sudden the Airbnb numbers look much more like those collected by myself and other external investigators, which Airbnb routinely say are inaccurate.

So what’s the real picture? Yes, 25% of Airbnb revenue in NYC comes from people renting out “more than two” listings. My own estimate actually comes out below that at 20% so my estimates are more friendly to Airbnb’s “regular people” pitch than reality. My numbers also show that about 40% of total revenue comes from people with more than one listing, which is just what I reported two years ago. It’s likely that the real number is closer to half, given the way my estimates seem to underestimate revenue from “more than three” listers. In short, far from showing that the critics were wrong, Airbnb’s numbers show that our data, which they have been rubbishing, is pretty good and even generous to them: their numbers suggest that even more of the business comes from multiple listers than we have been claiming.

So here’s the right way to say it. “From November 2014 to November 2015, about half of Airbnb’s revenue in New York City comes from multiple-listing hosts. Hosts with three or more listings contribute 25% of the total.” That’s a much more commercially-focused operation than the original claim.

The 93% number that Airbnb gave is, by the way, their projection on next year’s figures, to  which I can only say – if you’re going to release data, maybe talk about the data and not about your dreams and aspirations? So far their supposed efforts to clamp down on hosts with many listings have been half hearted, and given that it may cut into their revenues we should not give it a lot of credibility. Airbnb has been talking the talk a long time about this challenge on their site, and yet so far they have done basically nothing about it (I’m travelling and don’t have access to the full data set at the moment, or I’d show you).

Maybe more on this later. But for now, the new Airbnb numbers do nothing to undermine the critics’ case.

 

Lake Wobegon and the Panopticon: a simulation of real-world reputation systems

For some time I have been working on a simulation of reputation systems: a computational model I can use to think through some of the issues they raise. A first pass at this model is now available, together with a fairly long document describing how it works and some results, on GitHub as a Jupyter notebook here.

I was particularly interested in a seeming paradox in what we have learned about real-world reputation systems. As I say in the introduction:

In the few years since they have become widespread, reputation systems have shown two seemingly contradictory characteristics:

  1. (Lake Wobegon effect) Most ratings are very high. While ratings of Netflix movies peak around 3.5 out of 5, ratings on sharing economy websites are almost all positive (mostly five stars out of five). The oldest and most widely-studied reputation system is eBay, in which well over 90% of ratings are positive; other systems such as BlaBlaCar show over 95% of ratings as “five out of five”.

  2. (Panopticon effect). Service providers live in fear of a bad rating. They are very apprehensive that ratings given for the most frivolous of reasons by a customer they will never see again (and may not be able to identify) may wreck their earnings opportunities, either by outright removal from a platform or by pushing them down the rankings in search recommendations. Yelp restaurant owners rail at “drive-by reviewers” who damage their reputation; Uber drivers fear being “deactivated” (fired), which can happen if their rating slips below 4.6 out of 5 (a rating that would be stellar for a movie).

So are reputation systems effective or not? Here’s the seeming contradiction:

  1. The Lake Wobegon effect suggests that reputation systems are useless: they fail to discriminate between good and bad service providers (my take on this from a couple of years ago is here). This suggestion is supported by quite a bit of recent empirical research which I have summarized in MY NEW BOOK!. Customers are treating reviews as a courtesy, rather than as an opportunity for objective assessment. Rather like a guest book, customers leave nice comments or say nothing at all.

  2. The Panopticon effect suggests that rating systems are extremely effective in controlling the behaviour of service-providers, leading them to be customer-pleasing (sometimes extravagantly so) in order to avoid a damaging bad review.

If you are not a fan of computer models, or just have better things to do, here are my main conclusions, paraphrased:

  • The model demonstrates the important role of social exchange compared to a pure market or transactional exchange in most customer–service provider exchanges. It is this social exchange that is at the root of the Lake Wobegon effect, where all providers are above average. Reputation systems do indeed fail to discriminate on the basis of competence (quality).
  • A small number of entitled customers can induce a Panopticon effect. Service providers who engage in Give & Take exchanges with their customers (even very competent ones) risk being given a negative review, which will damage their business. The incentives of the reputation system encourage providers to indulge their customers, in order to avoid this unlikely but damaging judgement.
  • If reputation systems spread and customers become used to rating people in an “honest” fashion, we are building a terrible world for service providers. They must engage in emotional labour, catering to customer whims, or risk their livelihood. The Panopticon is here. The reputation systems continue, it should be noted, to fail to discriminate based on the competence of the service provider — instead of changing quality, they change attitude.
  • The Lake Wobegon effect and the Panopticon effect can coexist, and are coexisting. Reputation systems as they currently stand are failing to discriminate based on quality. But there is only one thing worse than a reputation system that doesn’t work, and that’s a reputation system that does work: Reputation systems promise a dystopic future for service providers, in which their careers are being shaped by reputation systems that are not working as advertised, but are working to compel compliance.

Uber: (Getting Over)^3

The story so far…

Susan Crawford wrote about “Getting Over Uber”. Swimming against the tide as a technophile and Internet enthusiast, she has come to believe that transport and communications networks in cities are about more than the market exchange of getting a ride. Also that Uber — a company that already squeezes its drivers as tightly as it possibly can — will squeeze even more tightly if it becomes unconstrained. Uber, Crawford says, is not a good idea for American cities.

Tim O’Reilly responded with Getting Over Taxis. He found Crawford’s arguments puzzling and unconvincing. He did some back-of-the-napkin math to show that Uber can be better for drivers than taxis. He concludes that while “common carriage” (uniform and universally accessible transport) is a noble goal, “when the private sector is doing a better job of providing that service than the previous government-chartered monopolies, government needs to get out of the way.”

Here, I want to do two things:

  • I think Tim O’Reilly’s back-of-the-napkin math about driver income gets some things wrong and I want to put another point of view.
  • That said questions about driver income are probably not going to make the difference in this debate, which takes us back to Susan Crawford’s post.

(Aside: I’ll sometimes call them “O’Reilly” and “Crawford” below. Californians may feel this looks hostile, but that is not the intent. I’ve just never met either of them, so take it as old-style British formality.)

(Do the Math: Taxi vs Uber)^2

Start with the questions about driver income. O’Reilly notes that taxi drivers typically rent his or her taxi from the owner, usually for a fee of just over $500 per week, after which, the driver keeps 100% of all fares and tips (but has to pay for gas). He compares this “gate fee” to the following Uber driver expenses:

  • Uber’s 25–30% that it keeps of every fare.
  • A $109 per week lease from Toyota, provided by Uber.

While it’s not easy to translate Uber’s cut into a weekly amount, O’Reilly notes that for this to equal the $500 per week gate fee for taxis, the driver would be making $2000 per week, which “seems unlikely”.

The equation, that Uber fee + lease is the equivalent of taxi lease and operating expenses (save gas) is off the mark. But I want to be constructive about this, so before I spell out an alternative, a few disclaimers:

  • There is no one taxi driver. It’s a complicated industry; even within one city, there is complexity. In Toronto, for example, there have been moves to permit more owner-operators (ambassadors) to take some power away from fleet owners, and then some modifications to let ambassadors have one other driver (to get the most use out of their license) and so on. Different cities have different rules. Small towns are different from the big metropolises.
  • There is no one Uber driver. The company sets very different rates in different cities ($2.15 per mile + $0.40 per minute in New York; $0.75 per mile + $0.15 per minute in Detroit), takes a different percentage of the fare, and even sets different “safety fees” ($0 in New York, $2.50 in Gary, Indiana, according to Biz Carson). And that’s before the whole surge pricing thing.
  • I’m not an expert. If you want to take a look at the complexities of driver expenses by someone who is, see Lawrence Meyers’ dense 27-page epic “Towards A Cost Estimate of A NYC UberX Driver”. Of course, NYC is only one city and the picture will be different elsewhere; details clarify the picture, but details also muddy the picture.

So with all those caveats, here is what Tim O’Reilly missed: the $500 per week “gate fee” that he talks about includes maintenance, repairs, depreciation and insurance in addition to the fee that the vehicle owner takes. The Uber “fee plus lease” misses the cost of maintenance (except, I believe, for oil changes and tire rotations), repairs, and insurance. Once we include those costs, things look different: the short version is that most Uber drivers probably get about the same as most taxi drivers.

Here’s the longer version. From what I could see last year, each dollar of a taxi fare gets split very roughly four ways: a quarter goes to the leaseholder, a quarter to the costs of car operation (including insurance), a quarter to gas, and a quarter to the driver. The “gate fee” is the leaseholder and the operation parts, so about half of the overall income.

When it comes to Uber, a quarter (or over) goes to Uber, about half goes to gas and costs of operation (minus commercial insurance) leaving about a quarter for the driver.

Where does that “half” come from? Two places: one is a table in Meyers’ paper that lists the revenue per mile that a driver is earning, and the percentage of revenue that is lost. A percentage of 40 to 50% is in the middle of this chart:

Driver expenses, as a function of revenue per mile, from Lawrence Meyers “Towards a Cost Estimate of a NYC Uber driver”

A second is that Meyers’ cost estimate is a bit higher than the numbers calculated by Justin Singer and lower than the 57c per mile that the IRS allows, so it’s in the right ballpark (but remember, there are many different ballparks).

So from what I can see, the overall split is fairly similar between Uber and taxis.

But there are some other differences to remember, one in favour of Uber and one against:

  • In its favour: Uber claims greater utilization (more rides per hour) which would lead to better incomes. The data it has provided in support of this is partial.
  • On the other hand, there’s nothing here about commercial insurance. Adding commercial insurance is expensive (Meyers suggests an additional 8c per mile, which amounts to somewhere around 5% or so of the fare). Of course, most Uber drivers don’t take out this insurance: part of Uber’s cost advantage is that passengers and drivers are often taking uninsured rides.

(Aside: I recently attended an Uber driver information session. They did not mention insurance at all until an audience member asked about it, at which point they said it’s between the driver and the insurance company. They would have had to wink broadly to make it any more clear that they aren’t checking insurance and won’t ask questions.)

What this leads to is that the Uber driver’s position is not so different from that of the taxi driver: both keep somewhere around a quarter of the fare, and increase utilization on Uber rides gets eaten up by the per-mile costs Uber drivers have to pay. While Tim O’Reilly says the amount you can make as an Uber driver is “almost surely higher than the median income for taxi and limousine drivers in 2012” I would suggest that it’s probably about the same, with quite a bit of variance both ways.

I admit that the estimates above remain full of holes and the conclusions are wrapped in caveats, but there’s one other reason I have confidence in my overall conclusion that most of Uber’s drivers are not making significantly more than taxi drivers. If Uber had comprehensive data that proved drivers were making a good income after expenses they would shout it from the rooftops. The fact that they haven’t (all their posts and papers talk about “before expenses” income) tells us a lot.

The future

Crawford argues that “ Uber consistently squeezes its drivers as tightly as it possibly can; new drivers are paying an even higher cut to Uber than the first generation did.” And I agree: the future is likely to be more difficult for Uber drivers.

Uber is currently losing money in its efforts to attract drivers and passengers. It’s possible that its Xchange car loan program is also a driver subsidy. But while losing money may help build the company pre-IPO, when accounts are private and growth is everything, it is not a sustainable strategy.

Within cities, the company keeps its own slice of the pie small when it gets started in a new location, to get riders and drivers onto the platform and to push the aggressive growth strategy it has adopted. It has then increased its cut in many places. Once the taxi companies have been pushed to the side, why should it continue to pay its drivers as much as it does now?

Where is the real dividing line?

I might be wrong, but I suspect that all the above is beside the point.

Susan Crawford talks about “My tribe — the technophiles, the Internet enthusiasts” being thrilled about Uber, and it’s the technophile part of this that is key: people who identify with the technology will generally identify with Uber. How we feel about Uber is shaped by how we feel about free markets and civic governance; who we identify with.

For all the argument, I suspect most people would have the same view of Uber whether their drivers make more than taxi drivers or make less. If taxi drivers make more than Uber drivers, then to some that would simply prove that taxi drivers are fat-cat monopolists who need to stop overcharging their customers and adjust to the new world. If Uber drivers make more than taxi drivers, then that just shows that the efficiency of new technology is taking us into a win-win world. And yes, I’m aware that identity-driven thinking goes the other way too.

Crawford owns up to her bias: “I’m a fan of taxis wherever I find them.” So I should make my own bias clear. I work in the private sector technology industry, but I’m a big fan of democracy. Public transit and city-provided public services matter, warts and all. Personally, I’ve met fascinating people (travelling from San Francisco airport on the BART I met one of the authors of the Kyrgyzstan constitution) and it takes you to interesting places (the M60 bus from La Guardia into New York city takes you to 125th Street and Lexington Avenue— it took me a while to place the intersection, but who wouldn’t want to go there?) I live in a house near public transit and did so while my kids grew up so they could be mobile and independent.

Does technology drive business or does business drive technology? I see business as the lead here. And given the incentives at work, I cannot trust Uber. Its success is rooted not only in its technology, but in avoiding costs like sales tax (in many cities), like insurance, and (despite all the claims) like providing universal service. It succeeds (as Crawford hints) by avoiding the costs of being a constructive partner in the cities where it operates.

It’s easy to forget that Uber is not yet a publicly traded company, so any information about its business comes from leaks or press releases. Venture capital needs its successful exit, and as anyone who has looked for a date on Ashley Madison or a blood test from Theranos now knows, there are big incentives to paint a selective picture when billions are at stake. Once Uber is public, the incentives change: balanced books and cost control become more important. What happens when Uber has displaced taxis and then needs to squeeze its drivers a little harder?

Looking even further ahead, Jathan Sadowski and Karen Gregory argue that Uber’s investors see this business model as an opportunity to privatize city governance. This is a damaging and antidemocratic goal. For a smallish city in Canada, what happens to accountability when faced with a massive American company with little interest in Canadian employment law or Canadian traditions? Two quick examples: as Michael Geist pointed out, Uber has no Canada-specific privacy policy, but what can Canadian cities do about it? And regardless of our labour laws, how can small cities respond when drivers are fired at will for being critical of the company or for unsubstantiated complaints by customers that cannot be appealed?

There’s one thing that Crawford and O’Reilly (and I) can agree on, which is that the sudden interest in urban transit that Uber has sparked may be valuable. I side with Crawford, because Uber is not just about getting a ride from A to B, it’s about our cities and the scope of our democratic institutions. Cities are more than the site for consumer-driven market exchanges. Where the line gets drawn between community, government and marketplace will differ from country to country, but what is constant is that people need a say in the decision: as citizens, not just as consumers. Local government, flawed as it is, is important — and it can be innovative. Cities need tending and democracy, not venture capital, is the best tool for that job.

Self-promotion alert: I have more to say about Uber and other sharing economy businesses in my book, “What’s Yours is Mine: Against the Sharing Economy”, coming soon from OR Books.

Volkswagen, IoT, the NSA and open source software: a quick note

(Attention conservation notice: the most interesting paragraph is the one about project NiFi, starting “Finley also writes…”)

According to Klint Finley in Wired, the lesson from the Volkswagen testing scandal is to use more open source software in more places. In particular, that Internet of Things (IoT) devices should be driven by open source software (even though the VW was not an IoT case). Here is Finley:

To protect consumers and realize its true promise, the Internet of Things must go the direction of the software and hardware that supports the Internet itself: it must open up…

Today, the vast majority of smart home gadgets, connected cars, wearable devices, and other Internet of Things inhabitants are profoundly closed… Ostensibly, this is for your own protection. If you can’t load your own software, you’re less likely to infect your car, burglar alarm, or heart monitor with a virus. But this opacity is also what helped Volkswagen get away with hiding the software it used to subvert emissions tests.

This seems wrong on two counts.

Finley writes about initiatives like the OpenWrt operating system for embedded devices as an alternative, but a lot of IoT devices already run on Linux. What stops individuals from being able to exert control over their gadgets is the use of Linux permissions structures, not the openness of the OS code. IoT security frameworks will be much like security frameworks on Android and other mobile operating systems: sandboxed applications running in their own user space, using the security features of the operating system. The open source/closed source distinction is essentially irrelevant to the problem.

Finley also writes that the closed nature of some IoT devices “makes it harder to trust that your thermostat isn’t selling your personal info to door-to-door salesmen or handing it out to the National Security Agency.” Which is ironic, because the software that might be handing out your personal info to the NSA is already open source. The NSA NiagaraFiles  project provides routes data among different computer networks and protocols. The NSA recently released this software as open source, and it is now hosted as an Apache project called NiFi. So that is the open source community (in the form of Apache) actively assisting the NSA in its data collection activities. The core developers on the project are all from the NSA and defense contractors (link). And NiFi is being touted as a big thing for IoT applications, so that your personal info can be more effectively routed to more destinations. The NiFi project is one more step in the active collaboration of Apache with the NSA, which I discussed back here and here, and tangentially in my FORTHCOMING BOOK.

The VW case is important and raises some big questions, but open-source vs closed source software is not  one of them. For a better take, see Zeynep Tufekci here.

(Full disclosure and openness: in my day job I have some involvement in IoT projects. My employer — FOR WHOM I DO NOT SPEAK –uses a mixture of open source and proprietary code in its work.)

Something I didn’t know: Bronte edition

So I was back home for a couple of weeks, and went to the Bronte parsonage for the first time in ages.

Here is something I didn’t know: look at this for a gothic two-year sequence of events (credit).

  • October 19, 1847: Charlotte’s Jane Eyre published
  • December 1847: Anne’s Agnes Grey published
  • December 1847: Emily’s Wuthering Heights published
  • June 1848: Anne’s The Tenant of Wildfell Hall published
  • September 24, 1848: Branwell dies of tuberculosis (age 31)
  • December 19, 1848: Emily dies of tuberculosis (age 30)
  • May 28, 1849: Anne dies of tuberculosis (age 29)