For some time I have been working on a simulation of reputation systems: a computational model I can use to think through some of the issues they raise. A first pass at this model is now available, together with a fairly long document describing how it works and some results, on GitHub as a Jupyter notebook here.
I was particularly interested in a seeming paradox in what we have learned about real-world reputation systems. As I say in the introduction:
In the few years since they have become widespread, reputation systems have shown two seemingly contradictory characteristics:
-
(Lake Wobegon effect) Most ratings are very high. While ratings of Netflix movies peak around 3.5 out of 5, ratings on sharing economy websites are almost all positive (mostly five stars out of five). The oldest and most widely-studied reputation system is eBay, in which well over 90% of ratings are positive; other systems such as BlaBlaCar show over 95% of ratings as “five out of five”.
-
(Panopticon effect). Service providers live in fear of a bad rating. They are very apprehensive that ratings given for the most frivolous of reasons by a customer they will never see again (and may not be able to identify) may wreck their earnings opportunities, either by outright removal from a platform or by pushing them down the rankings in search recommendations. Yelp restaurant owners rail at “drive-by reviewers” who damage their reputation; Uber drivers fear being “deactivated” (fired), which can happen if their rating slips below 4.6 out of 5 (a rating that would be stellar for a movie).
So are reputation systems effective or not? Here’s the seeming contradiction:
-
The Lake Wobegon effect suggests that reputation systems are useless: they fail to discriminate between good and bad service providers (my take on this from a couple of years ago is here). This suggestion is supported by quite a bit of recent empirical research which I have summarized in MY NEW BOOK!. Customers are treating reviews as a courtesy, rather than as an opportunity for objective assessment. Rather like a guest book, customers leave nice comments or say nothing at all.
-
The Panopticon effect suggests that rating systems are extremely effective in controlling the behaviour of service-providers, leading them to be customer-pleasing (sometimes extravagantly so) in order to avoid a damaging bad review.
If you are not a fan of computer models, or just have better things to do, here are my main conclusions, paraphrased:
- The model demonstrates the important role of social exchange compared to a pure market or transactional exchange in most customer–service provider exchanges. It is this social exchange that is at the root of the Lake Wobegon effect, where all providers are above average. Reputation systems do indeed fail to discriminate on the basis of competence (quality).
- A small number of entitled customers can induce a Panopticon effect. Service providers who engage in Give & Take exchanges with their customers (even very competent ones) risk being given a negative review, which will damage their business. The incentives of the reputation system encourage providers to indulge their customers, in order to avoid this unlikely but damaging judgement.
- If reputation systems spread and customers become used to rating people in an “honest” fashion, we are building a terrible world for service providers. They must engage in emotional labour, catering to customer whims, or risk their livelihood. The Panopticon is here. The reputation systems continue, it should be noted, to fail to discriminate based on the competence of the service provider — instead of changing quality, they change attitude.
- The Lake Wobegon effect and the Panopticon effect can coexist, and are coexisting. Reputation systems as they currently stand are failing to discriminate based on quality. But there is only one thing worse than a reputation system that doesn’t work, and that’s a reputation system that does work: Reputation systems promise a dystopic future for service providers, in which their careers are being shaped by reputation systems that are not working as advertised, but are working to compel compliance.
Interesting conclusions.
I’m in London and I regularly teach “Study Abroad” students from the USA courses on “intercultural communication” and similar topics.
There’s a fundamentally different “service culture” (at least historically, it is changing) in the UK, which makes me sensitive to how the culture in the USA is very power based. There are a bunch of recent articles about tipping in US restaurants that illustrate this.
So that’s another datapoint on how & why EBay and Uber ratings differ. Uber drivers are part of the “servant” economy, whereas Ebay is (perceptually) more like a garage sale…
Interesting argument … I haven’t read the technical details yet, but I wonder if your conclusions would go beyond the sharing economy. For example, I think of student evaluations of university professors, which have been around longer than online reputation systems. For Yelp, Uber, and etc., you say “The incentives of the reputation system encourage providers to indulge their customers, in order to avoid this unlikely but damaging judgement.” I often hear teachers complain that this is a consequence of using student comments to evaluate their effectiveness. As with online rating systems, there seems to be a Lake Wobegon effect with student evals.
And what about modern democratic elections. You don’t see much Lake Wobegon in politics, but you do see politicians try to protect their reputations by pandering to “a small number of entitled customers.” … The entitled customers being those with the most money to contribute or the most effective at organizing a niche constituency.
The last word is the most important one: compliance.
If a consumer looks at the so-called reputation systems as a compliance system, he/she does not look for differentiation, but just whether basic requirements are met.
The service provider has a strong incentive to offer compliance, but not need to worry so much to overindulge.
Wouldn’t both side be better off if the focus would shift from providing a ‘reputation system’ to offering a compliance system?
If my memory serves me right, that is exactly how eBay started – feedback was limited to something like 80 characters, and what one looked for is whether the product was what was described, and how fast it was shipped.
Reputation metrics is a piss-poor substitute for transparency. The latter, of course, means even less privacy in the workplace, but there should at least be hope of objectivity. Of more value to me than customer satisfaction metrics is (third party) lab-tested product specifications, or in the case of merchandise ordered online, maybe searchable databases of fulfillment history (but maybe some privacy sacrifice there, too). As far as Uber-like monstrosities, every aspect of your participation in it will be monitored and stored anyway—whether such a dataset is for public use or exclusively Uber’s use isn’t the difference between privacy and surveillance (the former was never on the menu to begin with)—it is the difference between transparency and information asymmetry.