Does Google’s search algorithm consider Native Americans not people?
On the 25th of January writer and youtuber Hank Green tweeted asking Google about the different search results to his two seemingly identical queries.
The two search results:
*Lets call these result’s the “indigenous” and the “colonial”.
The Mob
As customary on Twitter - pitch forks and fire torches were immediately raised by a majority. Commenting and Retweeting, calling Google’s search algorithm biased and racist. Probably additionally fueled by the recent controversy around how AI researcher Timnit Gebru was let go from Google Research after voicing ethical risks in language models. Who also retweeted a tweet connecting her paper: “On the Dangers of Stochastic Parrots — Can language models be to big” to these search results.
So what went wrong?
It would be easy to dismiss it as just keyword count… and so I did.
See if you can spot a pattern in the results, unfortunately I spotted it much later through a different test:
Pizza and Calculus to the rescue!
The original questions are a little too complex and have ambiguous answers. Let’s try asking something with a more definitive answer:
A monopoly on humans!
As soon as you use the word “humans” you get a Wikipedia article. It seems Wikipedia has a monopoly on the word.
Problem solved!
But…there was one more question I wanted to answer:
Does Google guess what you wanted to know?
What if Google would make some assumptions about you and what you wanted to know and not what you precisely asked for — after all Google has plenty of meta data.
- What’s the most visited page for each question.
- How long do people stay after visiting a page.
- Where is the person asking from.
The first two will be hard to test as an outsider but maybe we can find out who Google thinks we are:
Google will make many assumptions!
It doesn’t matter if we write “people” or “we”. So why are we still getting the “colonial” result and not the “indigenous” one? Looking at the results from “when did we go to war” Google makes two assumptions: That we are from the USA and that we were interested in the first World War. Maybe it also thinks we are asking about the United States.
Let’s confirm this:
Ambiguity and relevance!
America is too ambiguous with many using it interchangeably with the USA, and by simply being more specific that we are talking about the North America we get the “indigenous” result.
The reason why we get the “colonial” result is, it’s the more relevant result to someone asking about the United States.
Let’s prove this with a location that is not ambiguous:
Canada and the bias!
We removed the geographical ambiguity, the only difference now is the words humans and people. Another interesting thing is that we didn’t get a Wikipedia article this time when we searched for humans. Is this prove of bias?
Let’s take a look at the keyword count within the articles:
“when did humans…”: people: 52, human: 14
“when did people… ” : people: 8 , human: 0
So it’s about the keyword count after all…or not?
There are of course pages out there that have the “indigenous” result and use the word people and humans. So why didn’t we get those?
Tectonic Plates and Identity.
The USA and Canada are not just descriptors of geographic regions, they are also identities. When you ask about the geographical region, you are asking about the tectonic plates, and it makes sense to answer with the full history and thus the “indigenous” answer. If you ask about the identity it, it’s a concept in peoples minds and it makes sense to answer with how the concept originated, thus the “colonial” answer. Of course country names are mostly the only way we know how to describe geographic locations precisely enough and so these two concepts get muddled.
Let’s recap
- When we specify a geographic region like North America the distinction between region and identity is clear.
- Keywords still matter, using synonyms will help your ranking if others use only one term. That’s why Wikipedia has a monopoly on all things human.
- Using human instead of people has two implications*:
1. We are talking about the region and not the identity.
2. We are interested in a historic description from the point of view of humanity as a whole and not an individual group of people.
*A result of how we collectively use it in our language, this is not Google specific.
Moving forward — Opinion
Trust people trust the technology and this comes with responsibilities.
The more advanced the technology gets the closer it will mimic a human, for better or for worse. As we are undoubtedly biased in more regards than just race, and language has countless ambiguities we have to keep a close watch at what AI has learned from us. Some things we will be able to mitigate and correct, others will have to be accepted as consequence of peculiarities in language, ideologies, etc.
Ultimately google goal is to serve it’s users and not to serve the most technically correct answer to a question. So even if a question is phrased in a poor way, google tries to answer it based meta data.
People called for Google to educate users on their poor phrasing. Talking like to a human implies answers like from a human.
Adding a “did you mean” button like for spelling errors but also for semantics.
Disclaimers.
As computers more and more resemble humans, the current goal and pinnacle of technology, we will have to start to excuse “human errors”. And understanding “american” as identity and not region is more human than most would like to admit — after all it learned from us.
No algorithm for truth.