The Times and Google: what changed?

Quite a lot has been written recently about The Times allowing Google to index some of its content. Some of the coverage has suggested this is a capitulation by The Times which had previously taken allowed very little indexing.

I think they’re missing the point. The most interesting part of this story is that The Times “will begin showing articles’ first two sentences to search engines” (according to Paid Content).

This is a big change of stance by Google. Back when I was involved in the ACAP project they resolutely refused to contemplate anything which would allow a site owner to determine what part of an article might be visible in search results (the so-called snippet). Nothing in the robots.txt protocol gave site owners the ability to specify their preferences to this level of detail and, although ACAP did, Google refused to engage with it.

So the story here is not about The Times capitulating, mainly because they clearly have not. The story is that Google have met them in the middle and agreed on a way of indexing which is agreeable to both of them.

This is exactly the sort of thing which ACAP was meant to achieve, and if Google have softened their rigid approach to the way they’re prepared to operate, it is only a good thing.

For The Times it means they can use Google to help, not hinder, their business strategy. For Google it means their users see a large and visible gap in search results being filled.

I think that’s what you call a good outcome.

The internet wants to be open, but some internets are more open than others

Sergey Brin of Google had a discussion with The Guardian and talked about his vision for the future of the internet, alongside his concerns about threats to that vision.

It’s an incredible insight into his (and Google’s) world view, which seems to be from a truly unique perspective. There is nobody else who sits astride the internet like Google and it seems that from the top, the sense of entitlement to be the masters of all they survey is strong.

Take this quote, from towards the end of the piece:

If we could wave a magic wand and not be subject to US law, that would be great. If we could be in some magical jurisdiction that everyone in the world trusted, that would be great … We’re doing it as well as can be done

I’m not sure what this “magical jurisdiction” would be but it doesn’t sound like Sergey wants it to be based on US law, and there’s no sign that Google has any greater love for any other existing jurisdiction. I wonder if he’s thinking that perhaps it should be a Google-defined jurisdiction? After all, Google is fond of saying that the trust of users is their key asset – they presumably consider themselves to be highly trusted. I wonder if the magic wand is in development somewhere deep in their bowels? Perhaps one of their robotic cars can wave it when the time comes! Google can declare independence from the world…

But why should we trust them? There’s almost nothing they do which you can’t find fierce critics to match their army of adoring fans. Without deconstructing them all, surely the point is this: whenever a single entity (be it a government, company or individual) has complete control over any marketplace, territory or network, bad things tend to happen. Accountability, checks-and-balances, the rule of law, democratically enacted, are all ways of trying to ensure that power does not achieve its natural tendency to corrupt.

Google asks us to just trust it. And many people do.

Another quote:

There’s a lot to be lost,” he said. “For example, all the information in apps – that data is not crawlable by web crawlers. You can’t search it.

The phrasing is interesting. Is is really true that because data in apps is not crawlable it is “lost”? I use apps all the time, and the data appears to be available to me. I don’t think the fact that it’s not available to Google means it’s “lost” (except I suppose to Google). Defining something that is not visible to Google as “lost” suggests not just that Google considers that it should be able to see and keep everything that exists online, but also that they have an omniscient role that should not be subject to the normal rules of business or law. Like people being able to choose who they deal with and on what terms. Or being able to choose who copies and keeps their copyright works.

The “lost” app data could, of course, easily be made available to Google if the owner chose. Brin’s complaint seems to be that Google can’t access it without the owner deciding it’s OK – there is a technical obstacle which can’t simply be ignored. Yet all they have to do, surely, is persuade the owners to willingly open the door: hardly a controversial challenge in the world of business. It’s called doing a deal, isn’t it?

Here’s what he had to say in relation to Facebook

You have to play by their rules, which are really restrictive.. The kind of environment that we developed Google in, the reason that we were able to develop a search engine, is the web was so open. Once you get too many rules, that will stifle innovation.

Another telling insight. Too many rules stifle innovation. Rules are bad.

Hard to agree with even as a utopian ideal (utopia isn’t usually synonymous with anarchy), but even less so when you consider the reality of dealing with Google. I have visited various Google offices at various times and have always been asked to sign in using their “NDA machine” at reception. Everyone has to do it. You have to sign an NDA simply to walk into their offices. The first rule of Google is you can’t talk about Google. Hardly the most open environment – they are the only company I have ever visited which insists on this.

Of course, Google is no stranger to rules either. They set their own rules and don’t offer room for discussion or adjustment. When they crawl websites, for example, they copy and keep everything they find, indefinitely. They have an ambition to copy and keep all the information on the internet, and eventually the world. Their own private, closed, internet. This is a rule you have to play by.

Even if you ban crawling on some or all of your site using robots.txt, they crawl it anyway but just exclude the content from search results (this was explained to me by a senior Google engineer a few years ago and as far as I know it has not changed). If you want to set some of your own rules, using something like ACAP or just by negotiating with them, good luck: they refuse to implement things like ACAP and rarely negotiate.

“You have to play by their rules, which are really restrictive”

Here’s an interesting story. A while ago, Google refused to include content in their search results if clicking on the link would lead a user to a paywall. They said it was damaging to the user experience if they couldn’t read the content they had found with Google (another Google rule: users must be able to click on links they find and see the content without any barriers or restrictions). However it also meant users couldn’t find content they knew they wanted, for example from some high-profile newspapers like the FT and Wall Street Journal.

So Google introduced a programme called “First Click Free“. It set some rules (more rules!) for content owners to get their content included in Google search even if it was “restricted” behind a paywall. It doesn’t just set rules for how to allow Google’s crawlers to access the content without filling in a registration form, but also the conditions you have to fulfill – primarily that anybody clicking a link to “restricted” content from Google search needs to be allowed to view it immediately, without registration or payment.

This is a Google rule which you have to play by, unless you are willing to be excluded from all their search results. Not only is it technically demanding, it also fails to take account of different business models and the need for businesses to be flexible.

Unfortunately it was also wide open to abuse. Many people quickly realised they could read anything on paid sites just by typing the headline into a Google search.

Eventually Google made some changes. Here’s how they announced them:

we’ve decided to allow publishers to limit the number of accesses under the First Click Free policy to five free accesses per user each day 

They have “decided to allow” publishers to have a slightly amended business model. Publishers need permission from Google to implement a Google-defined business model (or suffer the huge impact of being excluded from search), and now they are allowed to vary it slightly.

For a company which objects to the idea of having to play by someone else’s rules, they’re not too bothered about imposing some of their own.

Which brings me back to trust. If Google want a world in which they have access to scan, store and use all “data” from everywhere, where they don’t have to play by the “restrictive” rules or laws (like copyright) set by others – even their own government – don’t they need to start thinking about their demand for openness both ways round? Rather than rejecting rules which don’t suit them (such as “US law”) shouldn’t they try to get them changed; argue and win their case or accept defeat graciously? Shouldn’t they stop imposing rules on those whose rules they reject, ignore or decry?

Google is a very closed company. Little they do internally is regarded by them as being “open”, and they build huge and onerous barriers to protect their IP, secrets and data. Even finding out what Google know about you, or what copies of your content they have, is virtually impossible; changing or deleting it even harder.

They ask us to trust them. We would be unwise to do so, any more than we trust any monopolies or closed regimes which define their own rules. It wouldn’t matter so much but for their huge dominance, influence and reach. They have, it is said, personal data on more than a billion people all of whom are expected to trust them unquestioningly.

Surely the first step to earning, rather than simply assuming, that trust is that they need to start behaving towards others in the way they demand others treat them.

Openness cuts both ways, Sergey. How about starting by practicing what you preach and opening Google up fully?

Breaking the Internet, one absurd claim at a time

I’m not much of a geek, so I can’t pretend to understand the technical minutae of the internet intimately.

But one thing I do know is that it was designed to be fault-tolerant, decentralised and robust. The basic technology was developed by the US Defense Department, some say to survive nuclear war but certainly to survive dodgy connections, and it seems to have worked.

While we all have our frustrations with the internet sometimes, and whole countries have been affected by interference from their governments, I have never heard of the whole internet breaking down. Even as bits of it fail, the rest carries on regardless.

The internet, by design, is hard to break.

Which means it’s hard to imagine something which would “Break the Internet”.

Yet that phrase, “Break the Internet” is one I have heard with increasing frequency. It is used as a dire threat, a prediction of doom, the ultimate and unimaginably awful unintended consequence of a terrible and naïve mistake.

Often, it is used as a way of explaining to policymakers, who by-and-large are even less geeky than me, why they should not do something they have proposed.

I first heard it when I was involved with the ACAP project. ACAP is a simple way of making content permissions machine-readable, thereby solving the problem of how automated services like Google are supposed to comply with terms of use.

We were on a trip to the USA to introduce ACAP to various industry and government people. It was going down well, in Europe as well as the USA. It was seen as a way of solving a sticky problem without having to legislate and avoided lots of awkward issues like DRM.

Google, who had initially been keen on ACAP and even delegated one of the search engineers to a committee defining its technical development, had turned against it. Presumably, although they never said this, they realised that if they were aware of terms of use they might have to comply with them.

Public statements were made by the likes of Eric Schmidt saying that there were technical problems with ACAP (even though Google had helped design the technical aspects of it) but implying that once they were solved Google would support ACAP. In fact they never engaged with ACAP to try to solve the supposed technical issues, nor explained what they were.

Anyway, the first time I heard the phrase “Break the Internet” was on that US trip. We had visited Google, and privately, on the way to dinner, I was told that the distinguished engineers were saying internally that ACAP would “ Break the Internet”. So however polite they were being, the engineers did not support it and there was little chance of getting much progress.

Obviously such a dire consequence would be cataclysmic, and nobody could knowingly support something which would lead to it.

But we were surprised because we couldn’t think of how ACAP could possibly do such a thing. How ANYTHING could do such a thing? My conversation was an informal one with a non-technical person (a lesser species at Google) and he was unable to explain what it meant – but it sounded bad.

We asked more technical people at Google but they were unable or unwilling to explain. Silence was the stern reply, and the dialogue pretty much dried up after that.

However we did hear the phrase “Break the Internet” again. This time it came from government officials, who told us that while they liked the idea of ACAP they had been told that it would “Break the Internet”.

We asked if this warning had come with an explanation, they said no. When we suggested that it would be a good idea to set up a meeting to discuss this with whoever had said it so that, once we had established the problem, we could fix it they agreed. ACAP after all, was about the end not the means. But the meetings never happened.

I reached the conclusion that ACAP was not some terrible time-bomb ticking under the internet. Quite clearly it couldn’t break anything at all (not least because technically it didn’t really do anything more than a copyright notice in a book – all it did was make licences machine-readable).

What it MIGHT have broken, or at least changed a little bit, is one aspect of Google’s business rationale. The bit which justifies them accessing any website, and using content by default for their various search products, without asking first, without paying any attention to restrictions or conditions which those sites might have specified in their terms of use and without paying money or offering anything other than traffic in return.

But the damage was done. Every politician and policy-maker wanted to be friends with the internet and with Google. All of them wanted to appear progressive and technically ept. None of them wanted to go down in history as the person who unwittingly “Broke the Internet”, and none of them were geeky enough to ask even the simplest questions to explore the substance of this ludicrous claim, or willing to facilitate a conversation which might lead to an answer.

So, even though they liked the idea of ACAP they were scared of supporting it in case something bad happened. Google’s rivals didn’t want to implement it if Google did not. The well intentioned and in my mind quite benign effort which ACAP represented became controversial and demonised.

The politicians and official, I get the impression, just looked the other way, and hoped that in time everyone would learn to just be friends.

Something rather good was lost, temporarily at least, as the result of a silly catchphrase – “Break the Internet”.

Anyway… it turned out that the absurd, hyperbolic and completely false assertion, in private, that ACAP would “Break the Internet” worked so well that the phrase caught on.

Taking advantage of the fact that many people seem to regard Google and everyone who works for it as some sort of super-species of superior intelligence and insight, unattainable by normal humans, the phrase came out in relation to other “threats” to Google’s (and others’) interests.

Recently David Drummond, Google’s chief lawyer, told an audience at Davos that the European proposals on privacy, specifically the “right to be forgotten” would – yes – “Break the Internet”. Again, clearly absurd, but seemingly taken seriously by those without the confidence to challenge it.

In relation to PIPA and SOPA there were numerous articles and blog posts making, spookily, the same prediction. These pieces of legislation, designed to reduce copyright piracy and help media organisations survive, would “Break the Internet”.

We can all chuckle at this, but it’s not funny. However little this claim stands up to scrutiny, those it is made to rarely if ever have the confidence to challenge it. It’s preposterousness is exceeded only by its effectiveness. It is a crazy, disingenuous, self-interested, untruthful and alarmingly potent claim.

So I want to challenge it, and other equally absurd claims like “the end of free speech” which runs a close second when it comes to silly predictions, and I want to show it up for the dishonest and false allegation it invariably is.

I want to appeal to everybody, especially policymakers and their staff, to not just disregard it but positively reject it as you would any other obviously ridiculous claim. Put it to the test, probe and enquire, find out what is really meant and if you discover that the reality doesn’t live up to the claim then you should deprecate not just the claim but all the evidence or claims put forward by that source.

Demand honesty, demand rigour, demand truth and punish those who would seek to deceive you by ignoring them.

