Google, meet Wal-Mart

Hey Google, let me introduce you to Wal-Mart.

You’re both looking a little out of place at this swanky party, and you don’t look like natural friends, but actually you have a lot in common and you’ll get along fine. Oh I know you are a lot more attractive than Wal-Mart – younger, cooler, and with much better taste in office furniture. But underneath all the surface appearances you share the same hobbies and even the same values, so you’ll overcome any awkwardness pretty soon. Trust me, you’re birds of a feather.

What hobbies? Well, you love to collect vast amounts of crap and then make it easy and cheap for people to get at it. And guess what? So does Wal-Mart. Don’t get me wrong, Wal-Mart isn’t in your league. He only gets crap from a small number of people compared to you. But he does have some great stories about making bargains. Just ask him to tell about the time he told Coca Cola to go back and redesign their Diet Coke recipe just for him – it’s hilarious.

And you know that crowdsourcing thing you do, where you get other people to work for free because it’s fun? Well he’s pretty good at that too, except he does it with whole companies. Like, he gets deodorant companies to manage his deodorant shelves for him, so he doesn’t have to! Wouldn’t you love to pull a trick like that with NBC or Fox? Look, I know he wears crappy clothes, but he can be a hoot when he’s in the mood.

It’s not too surprising that you have these things in common when you think about it. You’re both really good with technology and numbers to start with. And I heard some stories about you from Eric Clemons. Actually he was telling me that you remind him of airline customer reservation systems (CRSs) Sabre and Apollo. He told me that you come “between the shopper and the ultimate service provider (hotel, airline, retailer, or manufacturer), just as we saw in the case of the airline CRSs. The conditions are right for Google to enjoy enormous market power over service providers, who feel they must bid for positions in Google’s sponsored search keyword auctions.” Well that’s Wal-Mart to a tee.

Plus, you and Wal-Mart both have a sense of mission – that you are acting on behalf of your customers and bargaining for them against these powerful institutions like Coca Cola or CBS. You know what Wal-Mart says? He says “There is  only one boss. The customer. And he can fire everybody in the company from the chairman on down, simply by spending his money somewhere else”. Isn’t that the kind of thing you would say? Except you would be classier about it. You wouldn’t talk about anything as crass as money.

So you could learn a lot from Wal-Mart, but it wouldn’t just be a one-way conversation. I think you could tell Wal-Mart some pretty good stories too. There’s a thing you both do which I love. You’ve both got (I hope you don’t mind me saying) piles of money, but whenever someone asks you to pay for something you have this great way of showing your empty pockets – and they are always empty – and saying “I’d love to, but I just can’t afford it. I’m nearly broke as it is”. It’s a hoot! But you do it even better than Wal-Mart. His line is always that he makes only a penny or two on each sale so he’s nearly broke. Even he doesn’t have the gall to say he makes nothing.

You should tell him the story about the musicians. You know the one. Where you got Billy Bragg and Robin Gibb to work together. Not singing together – that would be a thing – but writing a letter at least. They said that YouTube should pay artists a royalty when someone listens to a song, kind of like when radio broadcasters or TV stations do. And after paying $1.65 billion-with-a-B to buy YouTube from these guys, you just stood there with a straight face and said “I can’t afford it”. It was hilarious! What were your words? Oh yes, you cannot be expected to engage in a business in which you lose money every time a music video is played. And you got plenty of people to buy it as well even though you never said how much money you make on advertising. They think Billy Bragg is the privileged jerk and you’re the poor guy on the corner just trying to make a living and on the side of the little guy. Really, you’ve outdone Wal-Mart on this one.

So let me introduce you. Google, meet Wal-Mart. Wal-Mart, meet Google. You’re two of a kind.

Simulations and Mechanisms

I've learned two lessons in the last couple of days.

First, if you want to get some attention for a blog post, call it something eschatological like "Online Monoculture and the End of the Niche". If I had called it "Simulation of a 48-product market under simplistic assumptions" somehow I don't think I would be writing a follow up. I don't like this lesson much. But I don't feel too guilty: if I was really trolling for traffic I could have called it "Learning from the Big Penis Book" [see Music Machinery for why].

Second, no matter how hard you try to be clear, many people don't get what you are trying to say. So maybe it's not their fault. For examples, see some of the comments here and here and even a bit here and on the original. The main complaint is that picking two example runs from a simplistic simulation of a small system with a small and fixed number of customers and products doesn't simulate the entire Internet. Where is the statistical sampling, the exploration of the sensitivity to parameters, the validating of the recommendation model? And on and on.

These folks don't get why people do simple models of complex things.

The goal of simulations is not always to reproduce reality as closely as possible. In fact, building a finely-tuned, elaborate model of a particular phenomenon actually gets in the way of finding generalizations, commonalities, and trends, because with an accurate model you cannot find commonalities.

For example (and I'm not comparing my little blog post to any of these people's work), in chemistry, Roald Hoffmann got a Nobel Prize and may be the most influential theorist of his generation because he chose to use a highly simplified model of electronic structure (the extended Huckel model). It is well known that the extended Huckel model fails to include the most elementary features needed to reproduce a chemical bond. Yet Hoffman was able to use this simple model to identify and explain huge numbers of trends among chemical structures precisely because it leaves out so many complicating factors. Later work using more sophisticated models like ab initio computations and density functional methods let you do much more accurate studies of individual molecules, but it's a lot harder to extract a comprehensible model of the broad factors at work.

Or in economics, think of Paul Krugman's description of an economy with two products (hot dogs and buns). Silly, but justifiably so. In fact, read that piece for a lovely explanation of why such a thought experiment is worthwhile.

Or elsewhere in social sciences, think of Thomas Schelling's explorations of selection and sorting in Micromotives and Macrobehaviour, or of Robert Axelrod's brilliantly overreaching The Evolution of Cooperation, which built a whole set of theories on a single two-choice game and influenced a generation of political scientists in the process. All these efforts work precisely because they look at simple and even unrealistic models. That's the only way you can capture mechanisms: general causes that lead to particular outcomes. More precise models would not improve these works – they would just obscure the insights.

That said, there are valid questions. Under some circumstances, aggregating large numbers of opinions into a single recommendation can give this odd combination of broader individual horizons and a narrower overall culture. Are there demonstrable cases of the monopoly populism model out there in the wild (aside from the big penis book)? Is this a common phenomenon or an uninteresting curiosity? Well I don't know. I do think so, obviously, otherwise I would not have written the post. But it's a hunch, a hypothesis, a suggestion, that I find intriguing and which I may or may not try to follow up. Hey, it's a blog post, not an academic paper.

Online Monoculture and the End of the Niche

Online merchants such as Amazon, iTunes and Netflix may stock more items than your local book, CD, or video store, but they are no friend to “niche culture”. Internet sharing mechanisms such as YouTube and Google PageRank, which distil the clicks of millions of people into recommendations, may also be promoting an online monoculture. Even word of mouth recommendations such as blogging links may exert a homogenizing pressure and lead to an online culture that is less democratic and less equitable, than offline culture.

Whenever I make these claims someone says “Well I use Netflix and it’s shown me all kinds of films I didn’t know about before. It’s broadened my experience, so that’s an increase in diversity.” And someone else points to the latest viral home video on YouTube as evidence of niche success.

So this post explains why your gut feel is wrong.

The argument comes from a paper by Daniel M. Fleder and Kartik Hosanagar called Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity. They simulate a number of different kinds of recommender system and look at how these systems affect the diversity of a set of choices. Towards the end of the paper they observe that some of their recommender systems increase the experience of diversity for every individual in the sample and yet decrease the overall diversity of the culture. So I wrote a program that does basically what they do in their paper and tweaked it to highlight this result.

The result is what’s important here, rather than the particular algorithm used to generate this instance of it. But I know some people will want to know how the results are generated, so I’ll give a short sketch. If you want more than this, Fleder and Hosanagar provide details, my tweaks to their model are available as source code (python) if you want, and if you post in the comments we could get into a discussion. But it’s not important, trust me.

Each simulation starts with 48 customers and 48 products. Each product is described by two attributes, with values generated according to a normal distribution. So the products are distributed on a two-dimensional grid, with a value of about -3 to +3 along each axis. Each customer is assigned a taste for each attribute, so they also are scattered about in the same space. The idea is that a customer will prefer, other things being equal, a product that is close to it in these attributes. Here are two distributions of customers (blue) and products (red). You can see that most customers share a mainstream taste around the middle of the graph, but there are a few who have odd tastes off to the edges. Likewise, most products have attributes that are mainstream, but there are a few “niche” products closer to the edge.

In this particular simulation, a customer can choose the same item over and over again, so it simulates something like streaming radio more than a bookstore. Each simulation starts off with a priming phase, in which each customer makes 75 choices according to a function which favours nearby products, but with some randomness so that they may on occasion choose one further away. After 75 choices we turn on a recommender function. Whenever a customer goes to make a choice, the recommender system identifies a product and recommends it to the customer. The recommendation increases the chance that the customer will choose the recommended product. Fleder and Hosanagar look at a few recommender functions. The one I use works like this:

  • The set of 48 customers is divided into equal-sized communities, with members chosen at random so they may not be close in taste.
  • The recommender function chooses an item by looking at what customers in the same community have chosen. It recommends the one most popular among others in the community.

I’m just going to show you two simulations. Run 1 above – which I will call Internet World – treats the entire set of 48 customers as a single community. The other (run 28 above), which I will call Offline World, breaks it into 24 communities of two people each. In Offline World I will get recommendations from the people around me and you will get recommendations from the people around you, but these recommendations are separate and isolated. In Internet World we each get recommendations from all 48 customers.

Here are the results for the two simulation runs I’m going to focus on. The results of these simulations are far from the only possible outcome, but they show why the gut feeling may fail, and I’ve chosen them for that purpose.

In Internet World each customer experiences an average of 3.5 products over the course of 75 choices with an active recommender system, while in Offline World each customer experiences only 2.4 different products. So the wider set of people providing recommendations in Internet World has led to an increase in individual diversity. This is like saying that “Netflix shows me pictures I would never had heard about from my friends alone”, or “Amazon recommended a book I had never heard of, and I liked it”.

On the other hand, the overall diversity of the culture can be measured by the Gini coefficient of the products. A Gini coefficient of zero is complete equality (each product is chosen an equal number of times) and a Gini coefficient of 1 is complete inequality (only one product is ever chosen by anyone). And Internet World has a Gini of 0.79 while Offline World has a Gini of only 0.52. Internet World is less diverse than Offline World.

How can these seemingly contradictory results happen? Let’s take a look.

In the following graph, each dot is a customer, arranged in their two-attribute preference space (just like in the graphs above). But this time the area of each dot is proportional to the number of unique products they experience. So in Run 1 (Internet World) you can see that the dots are, on average, bigger than the dots in Run 28 (Offline World). This shows the greater individual experience of diversity in Internet World; for example, there is a customer with attributes of (1.1, -0.8) who samples no less than 38 different products, and only seven of the 48 customers stay with a single product throughout the whole simulation. Meanwhile in Offline World  the most eclectic customer samples only nine and there are no fewer than 19 customers who sample just one product. The experience of individual customers in Internet World is of broader horizons and more selection, as recommendations pour in from far and wide, rather than from the limited experiences of their small community in Offline World. This picture has become the standard narrative of choice in the Internet World – our cultural experiences, liberated from the parochial tastes and limited awareness of those who happen to live close to us, are broadened by exposure to the wisdom of crowds, and the result is variety, diversity, and democratization. It is the age of the niche.


But wait!

Here is a graph of the products in each simulation. This time, the area of each dot shows its popularity: how often a customer chooses it.

You can see that on the left, in Internet World, a few products were chosen a lot, especially the one centred on about (-0.2, -0.2). In Offline World there are many more medium-sized dots, showing that the consumption of products is more equal. In Internet World one product has “gone viral” and gets chosen over 1500 times out of the total of 3600, while 26 products languish in the obscurity of being sampled fewer than ten times. In Offline World no single product is chosen more than 10% of the time, and only 14 products are sampled fewer than ten times. In short, niche products do better in Offline World than in Internet World.

While each customer on average experiences more unique products in Internet World, the recommender system generates a correlation among the customers. To use a geographical analogy, in Internet World the customers see further, but they are all looking out from the same tall hilltop. In Offline World individual customers are standing on different, lower, hilltops. They may not see as far individually, but more of the ground is visible to someone. In Internet World, a lot of the ground cannot be seen by anyone because they are all standing on the same big hilltop.

The end result is the Gini values mentioned before. Here are Lorentz curves for Internet World (blue) and Offline World (green), in which the products are lined up in order of increasing popularity along the x axis, and the cumulative choices for those products is plotted up the Y axis. 

So there it is. Individual diversity and cultural homogeneity coexisting in what we might call monopoly populism.

But don’t think this is just about automated recommender systems, like the ones that Amazon and Netflix use. The recommender “system” could be anything that tends to build on its own popularity, including word of mouth. A couple of weeks ago someone pointed me to this video of Madin, a six-year-old soccer prodigy from Algeria, and the next day my son, who moves in very different online circles to me, was watching the same one. I know who Jim Cramer is even though we don’t get CNBC in Canada because everyone is talking about him and helping his disembodied head to shoot down Jon Stewart. More people watched Tina Fey being Sarah Palin online than on Saturday Night Live, and Fey is now famous in countries where no one watches the TV show. Clay Shirky writes an essay and I get five different links to it in my Google Reader feed in one morning. Our online experiences are heavily correlated, and we end up with monopoly populism.

A “niche”, remember, is a protected and hidden recess or cranny, not just another row in a big database. Ecological niches need protection from the surrounding harsh environment if they are to thrive. Simply putting lots of music into a single online iTunes store is no recipe for a broad, niche-friendly culture.

The Brilliant Bechdel/Wallace Movie Test

I had not heard of this amazing test until seeing a mention in the morning paper, but it has been around since 1985. That was when Liz Wallace told it to her friend Alison Bechdel who put it in the comic strip Dykes to Watch Out For. See here for the comic and here for discussion.

For others who have been in the dark, here is their devastatingly simple rule.

To be worth watching, a movie must
  1. Have at least two women in it,
  2. who talk to each other, 
  3. about something besides a man. 

This morning's paper went through the Time Magazine top ten movies of all time list. Classics like The Godfather and 2001 A Space Odyssey fail. I have a feeling both TV programs I watched this evening fail. And I'm half way through reading The Unbearable Lightness of Being, which I think is likely to fail as well.

Mr. Amazon’s Bookshop: Another Conversation with Google

[This is the tenth episode of Mr. Amazon’s Bookshop. There will probably be another five or six before I’m done. A list of all episodes is here; the previous episode is here.]

I rang for Google who, as so often, appeared quicker than seemed reasonable, with his usual false air of subservience. 

“How the hell do you get here so quickly Google?” I demanded, slightly taken aback by the speed with which he materialized at my shoulder. 

“Oh, nothing special sir – I just keep a copy of myself close by in case you need anything. It may seem a little unusual, but it’s entirely neutral. Any other butler could do the same. Certainly no one is leveraging their unilateral control over your bell to hamper user choice, competition, and innovation.”

I often regret asking too many questions of Google. I never know when he is being funny and when he’s not. Usually I decide to treat his remarks as humour because life is just easier that way, but he has an excellent poker face so it’s hard to tell.

“Well, never mind. Listen, we have some questions for you. It’s about Mr. Amazon.”

Google’s face showed an instant distaste: “I’ll do what I can.”

I was just searching for the best way to phrase my questions when Kylie piped up.

“Hey Mr. G. We’re trying to find out whether Mr. Amazon’s recommendations are going to help me to sell lots of copies of The Adventures of Wazzock. We need to know some things about Mr. Amazon’s sales. Like how much he sells that you couldn’t get in a regular establishment bookshop.” She spat at Edmund as she said this last sentence.

“A fine question young miss. Let’s see. I am ashamed to say I know very little about Mr. Amazon’s sales in any detail. I have means to find out many things, but he seems capable of great secrecy. But from what I hear it is possible that he makes about 25% of his sales from books beyond the 100,000 mark. Such measurement depends greatly, of course, on what length of time the sales ranks are averaged over. A book that is positioned at 150,000 could easily put itself into the top 100,000 by a single sale – so do you count the sale as before or after the purchase? Still, I imagine the figure is not too far from the truth.”

“When it comes to real bookshops, a small independent bookshop may stock 30,000 titles. But of course they sell special orders too. Perhaps ten percent of their sales are special orders. A large chain store like Heather’s Big House O’Books may stock 100,000 titles. But there are larger ones still. I am told that Blackwells in Oxford, for example, stocks above 200,000 distinct titles in its shops on The Broad.”

“You’re a gem Mr. G. That’s what we need to know.”

“It is?” I asked.

“Obviously”, said Kylie. “Think a bit. He makes one sale in every four from outside the top 100 grand books. But when he mutters his ‘would you be interested’ guff, we look at these outsider books only about once in eight times. So he’s pushing the top sellers, innit? He’s only selling them others because he’s a convenient way to order books you already know about, which is probably just taking business from the regular bookshops where you might have ordered them before. And quit the boozing, we’ve got work to do.”

It took a moment before I realized she was talking to me. I sheepishly replaced the cap on the flask of cognac and returned it to my overcoat pocket.

“Hey shortbus,” Kylie was approaching Edmund, who flinched. “Let’s have a look at that other graph of yours. What’s it show?

Later on that day I managed to get a look at the graph, which I reproduce below. Meanwhile, Kylie was telling us all about it.

“So if I read this right, you’ve got your views of a book up the side, and your sales rank along the bottom. And most of the books Mr. Amazon shows us are bunched up at the left among the mainstream establishment junk, which we knew already. But it also shows that the ones he shows many times lot are almost all best sellers. Look at them buggers that have come up more than 50 times! Let’s see” (she scoured the notebook again, brow furiously furrowed.) “There’s 31 books that we looked at over 50 times. Twenty three of those 31 are in the top grand. All but two of them is in the top two grand. And I bet not one of them is The Adventures of Wazzock. He’s just showing everyone the establishment books innit? I thought he was all friendly to us kids, but it looks like he’s just like the bastard publishers! I should have known. He’s all about the money. What an absolute pillock.” She spat on the ground as she said this.

Such was her virulence I confess that I actually felt sorry for Mr. Amazon, who is after all just trying to make a living.

“Let’s not leap to conclusions young Kylie. After all, this is just one sample, like you said. Maybe if we set up the differ again and start with another one, we’ll see something different. And would you like an ice cream?”

Her expression remained ominously grim.

“I’m not being bought off with no ice cream. But you’re right Mr. W. We’ll have to try again. I’ll be back tomorrow morning. You be ready or else, numskull.” She kicked Edmund affectionately in the shins as she stomped off back down the drive.

Google Monoculture: Defending Jeff Atwood

Jeff Atwood at Coding Horror sent about 3,000 people here over the last couple of days from his post about the dangers of Google Monoculture. The least I can do is defend him against his critics, so here are answers to a few of the most common criticisms in the comments to his post.

I can switch any time I like, so it’s not a problem. Or “The Google monopoly seems a lot less scary than it’s marketshare would suggest because a new search engine is only a click away.”

No you can’t switch any time you like, for two reasons.

First: you can use another search engine, but when everyone else is using Google to find their way around the web, any other search engine that uses popularity as an input (most of them I think) is going to reflect Google’s recommendations. Google shapes the web as much as mapping it, and you can’t escape that shape easily even if you use another search engine.

Second: Google isn’t in the search business, it is in the advertising business. And while Adwords is its big moneyspinner, Doubleclick and Adsense ads are all over the place and you can’t escape them (not easily anyway).

The problem is that Microsoft acted in a belligerent and bullying manner, whereas google has not. An no one feels locked in, because they can switch in an instant.

Can we get past this idea that Microsoft is full of moustache twirling evildoers while Google is populated by friendly helpful people? Face it: companies respond to incentives. Google’s incredible success has saved it the tough decisions of squeezing the most out of every customer (and employee) so far, but when times get tough for Google, as they will sometime, it will squeeze just as hard as Microsoft, because it will have no choice.

Also, we have a different relationship to Google than to Microsoft. Most of us are Microsoft customers, but we are not Google’s customers, we are Google’s product. It sells us to advertisers. Google’s treatment of its advertisers is mysterious, but there are grumbles. For example, see El Reg on Google’s Money Machine; its ability to place ads on search terms that an advertiser did not bid on (and so collect money) and to extend ads to the iPhone (and charge for them) without asking their customers. And Google is quite keen to “embrace and extend” news organizations with Google News, and to use its site-directed top search results to take extra advertising revenue from the sites it directs you to.

Give Google time, then ask their customers if they feel they can switch.

A lot of people compared the Googopoly to the Micropoly. My very smart colleague Graeme commented: “In short, Microsoft’s monopoly was *created*, Google’s was *earned*.”

This is a tough one, but I disagree. In the key areas, both MS and Google made very good products by making the most of increasing returns to scale and from network effects. Many years ago I used Lotus 1-2-3 and WordPerfect, but Office just got better and more integrated. But that’s a subjective opinion and it is asking for trouble. In the end, sure we use Word because everyone else uses Word, but in a way we use Google because everyone else uses Google: like Microsoft they work hard to exploit their relationship to us. Microsoft did it with design consistency and with file format compatibility, Google uses the information we give them every time we search to tweak its products. Both have produced some very good products and some products that win just because of company size. I use Google Docs because the scale of Google’s operation makes it responsive and convenient, but let’s face it, it’s an unpleasant experience compared to other writing and spreadsheet products.

And finally, although I can’t find a comment, there’s this impression that Google is closed open while Microsoft is openclosed. Rubbish! Google gives away stuff that doesn’t matter to it. When it comes to its core technologies – the Adwords and Search algorithms and its data centre construction, Google is as closed and secretive as anyone. It recently refused to give information on water use at one of the new data centres because “We’re in a highly competitive industry and, frankly, one or two little pieces   of information like that in the hands of our competitors can do us   considerable damage. So we can’t discuss it.”

Very friendly.

The Horror, The (Coding) Horror

Who are all you People? 

Tramping over the flowerbeds, messing up the front porch, and sticking a big spike in my previously smooth and uneventful traffic logs? 

What's that? Jeff Atwood sent you? Sounds like a dodgy character to me.

Well OK. You can look aound if you like. Just clean up after yourselves, that's all. And don't eat anything from the kitchen. You'll regret it.