The internet wants to be open, but some internets are more open than others

Sergey Brin of Google had a discussion with The Guardian and talked about his vision for the future of the internet, alongside his concerns about threats to that vision.

It’s an incredible insight into his (and Google’s) world view, which seems to be from a truly unique perspective. There is nobody else who sits astride the internet like Google and it seems that from the top, the sense of entitlement to be the masters of all they survey is strong.

Take this quote, from towards the end of the piece:

If we could wave a magic wand and not be subject to US law, that would be great. If we could be in some magical jurisdiction that everyone in the world trusted, that would be great … We’re doing it as well as can be done

I’m not sure what this “magical jurisdiction” would be but it doesn’t sound like Sergey wants it to be based on US law, and there’s no sign that Google has any greater love for any other existing jurisdiction. I wonder if he’s thinking that perhaps it should be a Google-defined jurisdiction? After all, Google is fond of saying that the trust of users is their key asset – they presumably consider themselves to be highly trusted. I wonder if the magic wand is in development somewhere deep in their bowels? Perhaps one of their robotic cars can wave it when the time comes! Google can declare independence from the world…

But why should we trust them? There’s almost nothing they do which you can’t find fierce critics to match their army of adoring fans. Without deconstructing them all, surely the point is this: whenever a single entity (be it a government, company or individual) has complete control over any marketplace, territory or network, bad things tend to happen. Accountability, checks-and-balances, the rule of law, democratically enacted, are all ways of trying to ensure that power does not achieve its natural tendency to corrupt.

Google asks us to just trust it. And many people do.

Another quote:

There’s a lot to be lost,” he said. “For example, all the information in apps – that data is not crawlable by web crawlers. You can’t search it.

The phrasing is interesting. Is is really true that because data in apps is not crawlable it is “lost”? I use apps all the time, and the data appears to be available to me. I don’t think the fact that it’s not available to Google means it’s “lost” (except I suppose to Google). Defining something that is not visible to Google as “lost” suggests not just that Google considers that it should be able to see and keep everything that exists online, but also that they have an omniscient role that should not be subject to the normal rules of business or law. Like people being able to choose who they deal with and on what terms. Or being able to choose who copies and keeps their copyright works.

The “lost” app data could, of course, easily be made available to Google if the owner chose. Brin’s complaint seems to be that Google can’t access it without the owner deciding it’s OK – there is a technical obstacle which can’t simply be ignored. Yet all they have to do, surely, is persuade the owners to willingly open the door: hardly a controversial challenge in the world of business. It’s called doing a deal, isn’t it?

Here’s what he had to say in relation to Facebook

You have to play by their rules, which are really restrictive.. The kind of environment that we developed Google in, the reason that we were able to develop a search engine, is the web was so open. Once you get too many rules, that will stifle innovation.

Another telling insight. Too many rules stifle innovation. Rules are bad.

Hard to agree with even as a utopian ideal (utopia isn’t usually synonymous with anarchy), but even less so when you consider the reality of dealing with Google. I have visited various Google offices at various times and have always been asked to sign in using their “NDA machine” at reception. Everyone has to do it. You have to sign an NDA simply to walk into their offices. The first rule of Google is you can’t talk about Google. Hardly the most open environment – they are the only company I have ever visited which insists on this.

Of course, Google is no stranger to rules either. They set their own rules and don’t offer room for discussion or adjustment. When they crawl websites, for example, they copy and keep everything they find, indefinitely. They have an ambition to copy and keep all the information on the internet, and eventually the world. Their own private, closed, internet. This is a rule you have to play by.

Even if you ban crawling on some or all of your site using robots.txt, they crawl it anyway but just exclude the content from search results (this was explained to me by a senior Google engineer a few years ago and as far as I know it has not changed). If you want to set some of your own rules, using something like ACAP or just by negotiating with them, good luck: they refuse to implement things like ACAP and rarely negotiate.

“You have to play by their rules, which are really restrictive”

Here’s an interesting story. A while ago, Google refused to include content in their search results if clicking on the link would lead a user to a paywall. They said it was damaging to the user experience if they couldn’t read the content they had found with Google (another Google rule: users must be able to click on links they find and see the content without any barriers or restrictions). However it also meant users couldn’t find content they knew they wanted, for example from some high-profile newspapers like the FT and Wall Street Journal.

So Google introduced a programme called “First Click Free“. It set some rules (more rules!) for content owners to get their content included in Google search even if it was “restricted” behind a paywall. It doesn’t just set rules for how to allow Google’s crawlers to access the content without filling in a registration form, but also the conditions you have to fulfill – primarily that anybody clicking a link to “restricted” content from Google search needs to be allowed to view it immediately, without registration or payment.

This is a Google rule which you have to play by, unless you are willing to be excluded from all their search results. Not only is it technically demanding, it also fails to take account of different business models and the need for businesses to be flexible.

Unfortunately it was also wide open to abuse. Many people quickly realised they could read anything on paid sites just by typing the headline into a Google search.

Eventually Google made some changes. Here’s how they announced them:

we’ve decided to allow publishers to limit the number of accesses under the First Click Free policy to five free accesses per user each day 

They have “decided to allow” publishers to have a slightly amended business model. Publishers need permission from Google to implement a Google-defined business model (or suffer the huge impact of being excluded from search), and now they are allowed to vary it slightly.

For a company which objects to the idea of having to play by someone else’s rules, they’re not too bothered about imposing some of their own.

Which brings me back to trust. If Google want a world in which they have access to scan, store and use all “data” from everywhere, where they don’t have to play by the “restrictive” rules or laws (like copyright) set by others – even their own government – don’t they need to start thinking about their demand for openness both ways round? Rather than rejecting rules which don’t suit them (such as “US law”) shouldn’t they try to get them changed; argue and win their case or accept defeat graciously? Shouldn’t they stop imposing rules on those whose rules they reject, ignore or decry?

Google is a very closed company. Little they do internally is regarded by them as being “open”, and they build huge and onerous barriers to protect their IP, secrets and data. Even finding out what Google know about you, or what copies of your content they have, is virtually impossible; changing or deleting it even harder.

They ask us to trust them. We would be unwise to do so, any more than we trust any monopolies or closed regimes which define their own rules. It wouldn’t matter so much but for their huge dominance, influence and reach. They have, it is said, personal data on more than a billion people all of whom are expected to trust them unquestioningly.

Surely the first step to earning, rather than simply assuming, that trust is that they need to start behaving towards others in the way they demand others treat them.

Openness cuts both ways, Sergey. How about starting by practicing what you preach and opening Google up fully?

Comments

2 Comments so far. Leave a comment below.
  1. “Even if you ban crawling on some or all of your site using robots.txt, they crawl it anyway but just exclude the content from search results (this was explained to me by a senior Google engineer a few years ago and as far as I know it has not changed).”

    Have you checked your facts? This sounds suspicious to me. All webservers log incoming requests so it’s easy to see if the GoogleBot tries to crawl the content anyway.

    Otherwise, can’t disagree with the request for Google to be more open, but regarding your other comments, I think you have not understood that a WWW where you can’t search for stuff is really a much less efficient WWW.

    You said it yourself with the paywall thing, which I again find a little one-sided. If you have paid for Financial Times, you win by Google including them. Everyone not interested in paying looses since it is in effect just search spam. If 99% of the people using Google belong to the latter category, Google does their userbase a service by kicking bait-and-pay content.

    • That is how I was told it worked by someone directly involved. He said it was more efficient to separate the gathering and processing so they did it that way for technical reasons rather than trying to parse and sort everything as the crawler goes along. However this was some years ago and I am sure their technology has evolved several generations since then so who knows if it’s still the same? Actually it would be very interesting to know, because if this was done solely for reasons of technical efficiency then it would be easy to imagine that with the exponential increase in processing capability and frequency of crawling it would no longer be necessary.

      Another interesting thing I was told is that Google did (does?) operate a second crawler which is designed to mimic a person, does not declare itself as a crawler and appears in logs to be a normal user. The purpose of this one was to detect “cloaking” – i.e. sites which deliberately serve different pages to crawlers than to normal people which is a big no-no for Google. I wonder if they still do that too.

      And I perfectly understand that a WWW which is harder to search is different WWW from the one we know now. But I don’t think that would be an inevitable consequence of changing things a bit – after all site owners have an interest in being discovered. There are strong incentives for all parties to work effectively and efficiently together as long as the rewards are fairly shared. Search should not dominate the content revenues of the internet (even less so a single search player) and should not limit the business models which can be implemented. As long as search works that way it isn’t producing true efficiency (other than for the person making all the money) and it diminishes user choice by reducing the incentives to invest in content products. So the idea that we might end up with an internet where we “can’t search for stuff” is fanciful in my view, and in no way a likely consequence of having a more balanced set of rules and more openness.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 79 other followers

%d bloggers like this: