The Algorithms Shaping Our World

May 15, 2021
Heather Ferg

Machine learning is supplementing or replacing human decision making in virtually all sectors of society. Law, medicine, education, insurance, banking, advertising, communication, dating. The list goes on. Netflix Recommendation Engine filters content and makes recommendations for subscribers. Google Street View images have been used to estimate the demographic makeup of neighbourhoods. Faception claims to analyze facial images to reveal personalities in real time. Every area touched by computers is impacted one way or another.

1. Introduction

In my last post to On The Wire (here), I explained what algorithms and machine learning are. In this one, I will discuss some examples to illustrate how algorithms are used or can be used to shape the world as we experience it. Through recommendations, predictions, screening mechanisms and risk projections, algorithms dictate much of what we see, engage with and have access to. They are used in content delivery, to create proxies for direct data collection and all manner of risk assessments.

2. Content Delivery

Recommender systems (sometimes called recommendation engines) are systems designed to filter content and offer up suggestions to users. They are used in areas like music apps, online stores and social media or online dating platforms. Recommendation engines use various approaches to bring content or products to the attention of a user. For example, they may use collaborative filtering (based on the idea that users with similar histories will make similar choices in the future), content based filtering (based on a user’s known preferences or the item sought), location-based recommendations or session-based recommendations. The aim of such systems is to stimulate user interaction and create demand for whatever product or content the system deems most relevant for the user.

One of the most powerful and successful of these systems is the Netflix Recommendation Engine (NRE). The NRE uses a multitude of algorithms to filter content and make individualized suggestions. As detailed by Lighthouse Labs (here), approximately 80% of Netflix viewing hours are a result of the NRE’s recommendations. The system achieves this level of accuracy by tracking and analyzing the aggregate data of its 203 million subscribers worldwide. It tracks information such as:

time and date a user watched a title;
user profile information (age, gender, location, etc,);
the device used to stream;
if the show was paused, rewound or fast forwarded;
if the viewer resumed watching after pausing;
whether an entire TV series or movie was completed;
how long it took a viewer to watch an entire TV series;
whether the viewer gave the show or movie a thumbs up;
scenes users have viewed repeatedly;
the number of searches and what is searched for;
where a user watched the show (by postal code);
browsing and scrolling behaviour; and,
screen shots when the show was paused, when the user left the show, and when the user watches a scene more than once. (here, here and here)

Using this wealth of behavioural information, Netflix has achieved the highest customer retention rate in the industry and has gleaned the insights necessary to create highly successful original content.

3. Reverse Engineered Data Collection

Much of the focus on digital privacy is (rightly) on individual privacy as it relates to data collection. The landscape, however, is changing. Advances in deep learning and processing power mean that new insights can be gleaned from large (seemingly innocuous) existing data sets by mining them for previously unidentified patterns and correlations. In some cases, data need not be collected but startlingly accurate conclusions that strike at the heart of our collective privacy can be inferred.

A particularly striking example of this is the work that has been done using the Google Street View data. Google Street View is a feature of Google Maps and Google Earth that provides street-level panoramic images of many streets around the world. In May, 2017, Google’s cameras had captured over 16 million kilometres of images in 83 countries (here).

In 2017, researchers combined deep learning with Google Street View images to estimate the demographic make up of the United States. Having noted that the annual door-to-door American Community Survey (which measures statistics on race, gender, education, occupation, unemployment, etc.) costs over $250 million a year, researchers proposed automated data analysis of digital imagery as a viable supplement (p. 13108). As it turned out, the researchers could glean significant insights from the cars parked in a neighbourhood alone. In the study (here), 22 million vehicles were detected in 50 million Street View images. The researchers used deep learning Convolutional Neural Networks to train their models to determine the make, model and year of each vehicle.

The models were able to detect strong, highly accurate associations between vehicle distribution and socio-economic factors (p. 13110). Using existing voting data, the researchers were able to go neighbourhood by neighbourhood and estimate the income, race, education levels and voting patterns of the people who live there. They could accurately determine that Seattle is 69% caucasian and most African American people live in a few zip codes. They could estimate the fractions of the population in Milwaukee zip codes who had less than a high school degree, a bachelors degree or a post-graduate education. In Florida, the lowest income zip codes could be isolated in the southern tip of Tampa. In terms of voting, sedans were strongly associated with voting Democrat; extended-cab pick up trucks with voting Republican (p. 13110).

The conclusion of the study was that automated systems could effectively complement labour-intensive approaches to data collation and offer demographic measurements in real-time. Instead of asking people how they tend to characterize themselves on any given topic, assumptions can simply be made based on the cars seen in proximity to their homes.

4. Economic Risk Assessments

Risk analysis has countless fields of application and can focus on individual or systems-based assessments. Many sectors use automated scoring software in order to make decisions about who can qualify for things like loans, insurance or jobs. Algorithms are used to generate risk scores and machine learning is at the forefront of identifying new trends and correlations from which inferences may be drawn. Google Street View provides another example. In 2019, Polish researchers published a proof-of-concept study demonstrating a link between the outward appearance of one’s home and the risk they would be involved in a car accident.

In the study, published in the Central European Economic Journal (here), researchers used Google Street View images to predict car accident risk. They used 20,000 residential addresses of insurance company clients and looked for links between housing conditions and car accident risks. While zip codes are already used as factors in risk modelling and insurance pricing, the study focused on the characteristics of individual houses. The researchers concluded that visible features on a picture of a house (i.e., the type of house and its state of repair) can be predictive of risk independent of classically used variables like age and zip code (p. 155). Using this data, they were able to improve the existing predictive risk models by 2% which would be an extremely lucrative outcome for the insurance industry.

The authors of the study noted the ethical and privacy concerns that necessarily arise from this type of analysis. By providing consent to store a residential address, one does not necessarily consent to the collection and storage of a image of their house. This concern is aggravated to the extent that such images can be used as a proxy of ethnicity, religion or other sensitive characteristics (p. 156). This is again a case where sensitive data need not be collected from an individual. It can be inferred from what has already been taken.

5. Predictive Policing

Predictive risk assessments are common in the criminal justice system. They are routinely used in sentencing and their use has attracted much attention, particularly in the United States. Computer generated risk scores increasingly inform who may be released at every stage of the criminal process, from arrest to bail to sentencing. This is true despite a growing body of evidence that the machine predictions are not particularly reliable and are deeply racist (here and here).

In 2020, Citizen Lab and the International Human Rights Program at the University of Toronto, Faculty of Law, released a report titled To Surveil and Predict: A Human Rights Analysis of Algorithmic Policing in Canada (here). The report provides an in-depth analysis of the use and human rights implications of algorithmic policing practices used across Canada. As explained in the report, there are two main types of algorithmic policing technology: (a) location focused, surveillance-based policing; and, (b) person-focused predictive technologies.

Surveillance-based algorithmic technologies include: automated license plate readers, social media surveillance software, facial recognition software and social network analysis.

Person-focused technologies “rely on data analysis in order to try and identify people who are more likely to be involved in potential criminal activity or to assess an identified person for their purported risk of engaging in criminal activity in the future” (here). These predictions are drawn from analyzing information about someone’s friends, family, known associates, social media activity, criminal records or appearance in databases. They are factored to create a risk score which may lead to increased monitoring by law enforcement (p. 46). By way of illustration, the authors detail the Chicago Police Department’s Strategic Subjects List, known as the “heat list”. It used an 11-variable algorithm (which included criminal history, parole status and alleged gang membership) in order to identify individuals purportedly at risk for being a perpetrator or victim of gun violence (pp. 46-47). The list was found to be significantly less reliable than the police claimed and its use was decommissioned (here).

Algorithm-based predictive policing appears to be less common in Canada but is likely on the horizon. At least one province is actively developing predictive policing. While the Saskatchewan Police Predictive Analytics Lab (SPPAL) presently focuses on preemptively identifying potential victims (such as missing youth), future expansion of the scope will include other areas such as repeat/violent offenders, domestic violence, the opioid crisis, and individuals with mental illness in the criminal justice system (here).

In terms of the situation federally in Canada, The Tyee journalist Bryan Carney broke the story in 2019 on the RCMP’s “Project Wide-Awake”, a program of wide-scale social media monitoring and analysis designed to “help detect and prevent a crime before it occurs” (here). In a follow up story, Mr. Carney detailed documents that he obtained which showed that the RCMP were making efforts to secretly buy and use powerful mass surveillance tools while downplaying their capabilities (here). The documents revealed that the RCMP:

gained permission to hide sole-source contracts for Project Wide Awake from the public through a “national security exception”;
discussed “tier three” covert operations involving the use of proxies – intermediary computers located elsewhere – to hide RCMP involvement with spying activities;
purchased software with an aim to search “Darknet”, which it defined to include “private communications” and those from “political protests”;
has used a tool to unmask lists of “friends” on Facebook for users that specifically set friends’ information to private on the platform; and,
was “wasting resources, wasting time, wasting money” on IT projects, according to the then RCMP chief information officer. (Here)

6. Privacy Security Tools

Private security companies now offer many forms of automated digital security tools. Facial recognition software is sold to prevent shoplifting and public bathroom toilet paper theft (here) and law enforcement is routinely supplemented by products purchased from the private sector.

The use of machine learning and algorithm analytics in the private sector has given rise to all manner of claims about what can be “scientifically” gleaned from the data. One particularly horrifying example is the Israeli company Faception which claims to possess “breakthrough computer-vision and machine learning technology [that] analyzes facial images and automatically reveals personalities in real-time” (here).

Faception markets its software as being able to “analyze a person’s facial image and automatically reveal his personality, enabling security companies/agencies to more efficiently detect, focus and apprehend potential terrorists or criminals before they have the opportunity to do harm.” The system has built-in classifiers (which read much like horoscopes) that apparently “represent a certain persona, with a unique personality type, a collection of personality traits or behaviors” (here).

Based on facial scans alone, individuals are scored according to which of the classifiers they fit. While the classifiers can be customized, the main categories showcased on the company’s website include: High IQ, Academic Researcher, Professional Poker Player, Bingo Player, Brand Promoter, White Collar Offender, Terrorist and Pedophile.

The “Bingo Player” is said to be: “Endowed with a high mental ceiling, high concentration, adventurousness, and strong analytical abilities. Tends to be creative, with a high originality and imagination, high conservation and sharp senses.” The “Terrorist” apparently: “Suffers from a high level of anxiety and depression. Introverted, lacks emotion, calculated, tends to pessimism, with low self-esteem, low self image and mood swings.” The “Pedophile” is described as: “Suffers from a high level of anxiety and depression. Introverted, lacks emotion, calculated, tends to pessimism, with low self-esteem, low self image and mood swings.” (here) Careful readers will note that the latter two descriptions are the same. Its is hard to know where to begin with the problems inherent in literally all of this.

In an interview with The Daily Mail (here), Faception’s Chief Executive claimed they could categorize people with 80% accuracy. In 2016, the Washington Post reported Faception had signed a contract with with a homeland security agency (here).

7. Conclusion

As data is gathered by the state and the private sector (opaquely or otherwise), the tools and automated processes used to make decisions about how individuals are regarded and treated will require constant scrutiny. Much of the legal landscape in this area remains to be developed. In my next post, I will explore some of the ethical considerations raised by the use of algorithms and machine learning which must be factored into our personal and collective approach to the world we are building through our use of these technologies.

Blog

The Algorithms Shaping Our World