Theses on Netflix

Pretentious enough title for you?

I

Recommender systems – those algorithms that guess what you may be interested in as you browse Amazon or listen to last.fm – are commercially important. Netflix claims that 60% of its rentals are driven by its Cinematch recommender system [link]; that’s over half a billion dollars of business in 2008. As online commerce continues to grow, recommender systems will only get more important.

II

Recommender systems are culturally important too. As more of our culture moves online, they will be responsible for more of our cultural experiences, and will play an important role in shaping the creative parts of our societies.

III

Recommender systems will get better. Ten years ago they were largely improvised. Now you can do a Ph.D. in recommender systems and there are international academic conferences all about them [link]. The subject is ideal for academics – it is algorithmic and yet open ended, with many different approaches and criteria for success. It’s an endless playground for exploration and simulation.

IV


Even though they will improve, there is no such thing as an optimal recommender system. Accuracy is insufficient. The interests of recommendees vary. Serendipity, intra-list variety, reliability and trust-generation are just a few other considerations [
pdf link].

V

Don’t confuse the outcome of recommender systems with intrinsic merit. The recommendations are highly dependent on history and are the products of cumulative advantage. Many think that “if the experts could only figure out what it was about, say, the music, songwriting and packaging of Norah Jones that appealed to so many fans, they ought to be able to replicate it at will”. But hits cannot be reliably predicted because our choices and preferences are too inter-linked. Clive Thompson writes that companies with recommender systems can “track everything their customers do. Every page you visit, every purchase you make, every item you rate — it is all recorded.” [link] But other studies have shown the systems to be chaotic. Tiny, random fluctuations can lead to completely different outcomes. [link]

VI

Recommender systems can easily reinforce inequalities among recommended items. A system that recommends popular items will increase those items’ popularity. Unpopular items will be left in the dust. Such systems can make big hits even bigger, and can lead to an overall decrease in cultural diversity.

VII

Recommender systems can increase the experience of diversity. By drawing attention to items individuals have not found by themselves, they can lead to new experiences. But individual diversity is different from overall diversity. Some systems can increase both individual and overall diversity. Other systems increase individual diversity but, at the same time, prompt consumers to be increasingly similar to each other. Their selections then come from an increasingly narrow range of items [pdf link].

VIII

Ownership matters. Given the variety of approaches, outcomes, and absence of clear “best” alternatives, and given the ability of recommender systems to shape the experiences of their users, there is ample room for ulterior motives to become embodied in the system. The incentives for the recommender and the recommendee may be different. The incentives for Netflix in a regime where they deliver physical DVDs (of which they have limited stock) may be to promote the back catalogue. When they deliver movies digitally (as they are about to) there may be no such constraint and they may be more tempted to promote existing blockbusters. The most valuable recommender systems may be those that are independent of producers and vendors.

IX


Transparency matters. The unmarked presence of sponsored items in a recommendation list would be widely viewed as a corrupt set of recommendations, but just as some bookstores charge for premium display sites within the store, so sites on recommendation lists may be sold. Recommendees have a right to know if payola is part of the system.

X

Recommender systems will displace the filtering role of both reviewers and of publishers. But while bad reviewers and publishers would not be missed, good reviewers and publishers are not only filters; they are also an active part of cultural creation. The impact of recommender systems on these members of creative communities is important.

XI

The word “community” is widely used in conjunction with recommender systems, but they do little to build communities. Their use is essentially an individual, isolated act. Groups and networks are as important in the creation and experience of culture as individuals. Recommender systems will play a role in how culture is experienced, but they are not necessarily a strong force pushing us either towards or away from a healthy culture.

 

XII

Recommender systems only filter culture, in various ways; the point is to create environments in which artists can prosper.

Not Getting Work Done

Author Helen DeWitt has moved into an apartment with no phone and no Internet connection for 5
months so she can get some work done. [link]

I have not.

That's why I am reading about Helen DeWitt instead of getting some work done.

Long Tail stops wagging

Google CEO Eric Schmidt says:

 although the tail is very interesting and we enable it, the vast majority of the revenue remains in the head. And this a lesson that businesses have to learn. While you can have a long tail strategy, you better have a head, because that’s where all the revenue is. 

and this prompts Long Tail author Chris Anderson to make several admissions:

But there were clearly exceptions to [Long Tail behaviour]. One of the main ones was the irony that there was a very short Head of Long Tail aggregators: Amazon, iTunes, Google and their kin dominate their markets to a blockbuster-like degree. 
I blamed this on a still-young market and assumed that even aggregators would fall victim to the flight from one-size-fits-all someday. But new research from McKinsey (free registration req'd) suggests that this sort of radical inequality is increasingly the norm as markets get more networked. 

"Powerlaws do imply wildly unequal distributions of money, power, celebrity and everything else." – so much for 'democratization'.

And it's not just companies. The Long Tail–the powerlaw created by network effects–may be creating super-celebrity, too.

As I've said many times, both in the book and elsewhere, most of the rewards in the Tail are non-monetary: a larger audience for producers, and more choice for consumers. 

I'll end by conceding a point: It's hard to make money in the Tail.

There's more, and he holds on to some of his assertions, but basically, it's all over for the long tail.

Link Books to Their Open Library Page

When you reference a book and want readers to be able to find more about it, it's common to link to Amazon. There are problems with Amazon being the default site for all books: by linking there you are basically recommending that your readers support Amazon and don't support your local bookstore.

So how about linking to Open Library instead whenever you mention a book, like the best novel ever? It's an evolving initiative with a fine mission: One web page for every book. It's got some quirks still – the treatment of books with many editions is a bit of an issue – but it's a great idea and it's noncommercial, which is hugely important for a common resource. You won't find much information there about most books, but you will find links to buy a book at some of the bigger online bookstores and to your local library catalogue if you're lucky.

Now if only they could link to your local bookshop, which should be possible for those who have online ordering stores like our neighbourhood one, I'd be very happy with it. 

Obviously there's more work to be done there, but one way to get it done is to make people aware of it, and one way to do that is to link to it.

So spread the word: link books to their Open Library page.

Cloudy Monopolies II:

After yesterday's bout, Tim O'Reilly responds to Nicholas Carr and Nicholas Carr responds
to Tim O'Reilly's response. There's something about the blog world that
makes referring to these two as Carr and O'Reilly seem terse, and yet
to call them Nick and Tim just looks unctuous. So I'll call them by
their initials: NC (C for Carr and Cloud) and TO (for Tim O'Reilly and
Two point O).

To bring you up to date:

  • TO claims NC
    interpreted "network effects" too narrowly ("Nick only sees first order
    network effects") while NC claims TO interpreted "network effects" too
    broadly ("today O'Reilly is expanding his definition of 'network
    effect' far beyond his original definition").
  • Meanwhile NC claims
    that TO interpreted "cloud computing" too narrowly ("O'Reilly is here
    using 'cloud computing' in the narrow sense…') while TO claims NC
    interpreted "cloud computing" too broadly (well, I can't find a
    sentence to back that up, but it's just too symmetrical to leave out).

TO
actually has a point, and I agree with him that NC is overly-narrow.
There are many kinds of indirect network effects that still qualify for the name. For example, here I am
referencing only TO and NC while others have also
contributed to the debate. And I suspect that TO and NC have taken this
issue up because their opponent is an A-list blogger and so a dispute
gives the issue a prominence it would otherwise not have. So both TO and NC are gaining readers because they already have a lot of existing readers. So the world turns.

But on the big issue, I still side with NC. Here's why.

TO defines the original issue as this claim, written by Hugh Macleod:
"Nobody's saying that one day a single company may possibly emerge to
dominate The Cloud, the way Google came to dominate Search, the way
Microsoft came to dominate Software." But then TO later redefines the
question as whether the cloud infrastructure business is likely to be a
low-margin business based on selling commodities. Given NC's question
"will the infrastructure suppliers also come to dominate the supply of
apps?" the dispute is actually about not one question, but a set of distinct questions:

  1. Is infrastructure cloud computing a low-margin business?
  2. Is infrastructure cloud computing a commodity business?
  3. Is infrastructure cloud computing a natural monopoly?
  4. Do profits lie solely at the top layer of the cloud (applications/APIs)?
  5. Are the synergies among layers of the cloud enough that success in one area translates into success in other areas?

You see, this is why I'm no top blogger. Each of these is a complicated
question, and I'm not qualified to answer any of them. It's not even
clear to me which of these questions are separate. This is why I don't
trust big ideas or catchy names and why I'll never write a best-selling
business book (well, one reason. I'm sure there are many more). But in
pursuit of some light rather than heat I'll try to draw some lines.

Most importantly, the first three of these questions are obviously related, but TO combines them and confusion results. A typical commodity business (selling
widgets that are the same as the next guy's widgets) is low margin
because a commodity business is characterised by decreasing returns and
so by low barriers to entry and intense price competition. But you can
be in the business of selling commodities and yet still profit from
increasing returns. Amazon is a prime example. Books are commodities
(anyone can sell a book), and this limits Amazon's margin per book to
being small. But online bookstores are not commodities – they are
characterised by huge fixed costs (the Amazon computing infrastructure)
and there are massive barriers to entry in the online bookselling
business. So Amazon is a low-margin seller of commodities but is also a
near-monopoly. Wal-Mart is another example in the physical world – it's
obviously in a low margin business (really low) but it has used
economies of scale to build a large market share in the US. Telcos and
utility companies are others where the forces towards a natural
monopoly are strong enough to have attracted antitrust action while
still being low-margin. Few would dispute the fact that Amazon and
Wal-Mart are both immensely powerful companies. The argument that
"infrastructure cloud computing is a commodity business, therefore no single
company will dominate that layer of the cloud" is simply false.

Is infrastructure cloud computing a natural monopoly? There are many
forces (network effects and others) pushing it in that direction, but
there are others (the variety of services demanded in particular) that
pull it away from a natural monopoly. I suspect this market will be
oligopolistic rather than monopolistic – that there will be enough
fragmentation in the kinds of offering available that a demand for
variety will make it more like the automobile industry than like the
online bookstore industry.

Do profits lie solely at the top layer of the cloud? Again, no. You can
be low-margin and high profit as long as the scale is big enough (see
Amazon and Wal-Mart, above).

And finally, are synergies enough that a single company (Google being
the obvious one) may come to dominate all layers of the cloud? That's a
hell of a question. If there is any company that could do so it would be Google. And
if it does then I hope the antitrust people are all over it. But
sources of variety are surprising – many people thought that the age of
TV would mean local accents would die out, but of course they are still strong in many places – and I am hopeful
that there are enough such sources to prevent the web collapsing into a
Google-shaped black hole.

O’Reilly vs Carr on Cloudy Monopolies

In the offline world it is obvious that industries have different levels of concentration. The forces that shape the grocery store industry are different from those that shape the automobile industry, home furnishing stores, copper mining, the insurance industry, and so on.
In contrast, there is a tendency to think as if all digital industries are governed by the same set of factors. We talk about iTunes as if it has something to say about the success of SalesForce.com; about Netflix as if it has something to do with the success of Google. The runaway success of companies like Facebook and Amazon, which provide services that get more valuable the more people use them, has led to a focus on network effects as the driving force behind industry concentration. But it's obviously true on the Internet just as it's true elsewhere that selling books is different from selling advertising.
I say all this because A-list technopundits Tim O'Reilly and Nicholas Carr have competing posts about industry concentration in the world of cloud computing. And while I lean towards Carr, each oversimplifies the issues.
O'Reilly coined the phrase Web 2.0 and he says the cloud is, not surprisingly, all about Web 2.0. "Understanding the dynamics of increasing returns on the web is the essence of what I called Web 2.0. Ultimately, on the network, applications win if they get better the more people use them."  That is, there is one source of increasing returns you need to think about when considering industry concentration on the Internet, and that's network effects. On the other hand, he recognizes that "cloud computing" is a name that covers several different developments, including hosted virtualized computers (Amazon's EC2 and S3), hosted software platforms (Salesforce.com and Google Apps) and cloud-based applications (Facebook, Flickr). He claims that real industry concentration (and hence profits) will take place only in the third of these layers, because only here do network effects really come into play.
Carr's recent book The Big Switch caught the wave of cloud computing, and so he says that the cloud is more important than mere Web 2.0. He points out that there are many sources of increasing returns in addition to network effects. Using Google as an example, he points to (not by name) learning-by-doing (Google gets better at search algorithms because they learn from their own experience); high fixed costs (the need for massive data centres); and asymmetric information (the predictability of a brand-name experience). But Carr does not distinguish different aspects of cloud computing.
I think Carr is closer to being right. There are so many sources of increasing returns in digital industries, in addition to the big one that the marginal cost of production of a digital good is basically zero, that the normal expected state of Internet industries is surely monopoly, give or take.

But guess what? Not all Internet industries are monopolies and if we are to get to grips with industry concentration in Internet-driven industries we need to acknowledge that different digital industries are pushed by different forces. Different parts of the cloud computing world really are different and will see different levels of concentration.

What we need to do is not so much look at the sources of increasing returns to scale, which are many, but instead look for what factors might limit increasing returns and prevent the expected monopolies from forming. Any such factor will affect different digital industries in different ways. I'll just look at one factor to show what I mean.

Just to be au courant, let's look at Paul Krugman's Nobel prize winning work (see a PDF here) as a source of inspiration. We have, collectively and individually, a preference for variety (and, loosely speaking, a variety of preferences) and this preference limits industry concentration. Here's how. According to the classical model of trade, industries should get very concentrated as each country focuses on its own area of comparative advantage. If Germany is good at making cars and Sweden is good at making bookcases then all the cars will  be made in Germany and all the bookcases in Sweden – then they will trade cars for bookcases. 
But in the real world both cars and bookcases are made in both Sweden and Germany and they each trade both cars and bookcases among themselves. Industry concentration is limited. How come? 
One of the things Krugman used to explain this observation is the idea that as consumers we have a taste for variety. So MacDonald's may have economies of scale and sell a lot of burgers, but no one wants to eat MacDonald's all the time so there's a limit to how much of the restaurant industry MacDonald's can win over. There are many sources of heterogeneous tastes. Geography is an obvious one – I'm not interested in a bookstore in Saskatoon because I've no way of getting there. Culture is another – I'm not interested in Russian-language books. And when it comes to music and clothes, well there's no accounting for taste. But wherever there is a taste for variety, industry concentration will be limited.
It will be interesting to see what kinds of variety create barriers to concentration in the cloud. On the hardware/virtual machine realm I can imagine security and service level agreements providing distinguishing factors that may encourage variety. Is it worth a hosting company going through endless certifications to get validated as a secure-host for private data? Will they bet the bank on high service levels? At the Web 2.0 applications level, it's worth remembering that culture provides a surprisingly strong barrier for some experiences. As just one physical world example, the record of giant grocery stores outside their own country is surprisingly patchy. Wal-Mart, Tesco and Carrefoure each hold sway in their own country but haven't been able to be successful in the others'. When I signed up to Facebook the obvio
us Americanness of the site was a turn off (which college did I go to? a politics spectrum from "conservative" to "very liberal"?). Perhaps this is why different countries have different major social networking sites.
But my fear is that a taste for variety may not be strong enough to hold back very large concentration at many levels. You might think that people searching for different kinds of information may use different search engines. In Canada when I search for NDP I want to find the New Democratic Party, but an Irish searcher may be looking for the National Development Plan. But so far Google has been able to use its sources of increasing returns to scale to accommodate this variety and more, without letting competitors in. There are basically no niche search engines. 
It's the same in online encyclopedias; in the real world the physical limits of books mean that there are all kinds of specialist interests that Encyclopedia Britannica cannot cater to, and so we have encyclopedias of film, of football, of frogs, and so on. In the online world there's only one encyclopedia and it covers all these areas. 

And that's why I am inclined to agree with Carr.