Today I was incredibly fortune to see Nello Cristianini present on the subject of ‘Living with Intelligent Machines’ at Government Digital Sevice. Nello is a lecturer at Bristol University and has been studying the field of Artificial Intelligence for over 25 years.
The talk focused on ethical considerations surrounding data, AI and machine learning.
In this post I will write about some of my key take aways. Please keep in mind these are my own understandings and interpretations of the presentation. I have used a mixture of Nello’s examples along with those that I have found myself.
The Business of Data
‘AI’ and ‘Big Data’ have been thrown around as marketing buzzwords over the last few years in order to sell products. Cisco have gone so far as to create promotional images displaying the quote ‘Data is the new oil‘.
Clearly there is money in data. I am not going to search for stats on how many people use Google search engine, but it’s obviously a lot of people. From what I can gather, Google don’t sell this data, but they do use it to power their own advertising platform which other companies pay to use. Advertisers trust that their ads will be targeting users that are likely to be interested in whatever they are selling.
Due to the perception of data = lots of cash, start ups inevitably start springing up, claiming to offer AI solutions to various problems. It could be far too easy for someone who doesn’t really know what they are doing, to hack together a basic model using some Python libraries, package it up as AI and sell it.
I am starting to learn how to build machine learning models myself, and yes for the most part I will be learning through hacking things together and seeing which results I get. When you are learning it is so important to do this; to experiment, to see what you can build and to practically explore the theory you read. It is equally important to remember that this is not something that should be deployed. By all means share your code as a learning exercise but don’t try and sell things that you don’t fully understand. Nello reiterated that quick solutions are worse than no solutions.
Aggregating data and presenting it to users online has disrupted many industries over the last two decades. Websites such as Compare the Market (comparing and buying insurance) and Skyscanner (finding/comparing/booking flights) save users from trawling through lots of different shops or websites to find the options that best suit us.
Airbnb, ASOS and online food shopping mean that we can choose to make purchases without physically needing to go anywhere. Google Maps helps us find where we need to go. Skype allows us to video call our loved ones from across the world.
These online services convenience our lives and are free so we use them. They have succeeded in removing an intermediate layer. We don’t want to go back to paying to speak to our relatives abroad via landline, but we don’t tend to question wether our Skype video calls are being used to train facial recognition systems (Microsoft President Brad Smith wrote a blog post in 2018 calling out the need for public regulation and corporate responsibility for facial recognition technology).
We have got so used to consuming this technology that it’s virtually impossible to go back. People have been unpleasantly surprised in how companies have used their data, which has resulted in backlash. In 2012 a 26 year old bar manager was refused entry into America because he tweeted “Free this week, for quick gossip/prep before I go and destroy America”. He meant it in a party sense, but it was interpreted as raising concern and he was sent back to the UK.
Nello showed us an example of Facebook banning the insurance company Admiral from pricing car insurance based on the content of users Facebook posts. Admiral claimed that they were experimenting with ways in which young drivers could prove their sensibility. The idea was that young drivers opted in to sharing their Facebook data and could save up to £350 if they appeared sensible. According to Admiral (…well actually the BBC news article I read about the project…) being sensible involves “writing in short concrete sentences, using lists, and arranging to meet friends at a set time and place, rather than just “tonight””.
Being assessed based on your online activity, throws a spanner in the works of the theory that open and transparent models will eliminate bias. If you act in a certain way to meet the criteria of a model then you can’t live as your authentic self. Nello gave an example of Uber drivers reporting that they felt they had to take abuse from customers in order to maintain a high rating to adhere to the scoring system. We don’t have an ID card scheme in the UK, but for countries that do, could various scores be added to represent people based on their data? Could they be denied opportunities because of a score that may not accurately represent how ‘good’ they are?
Making sense of the digital world
These issues are very complex. The 2016 introduction of GDPR helps through setting regulation in EU law about data protection and privacy, but as users, often we will just accept the cookies and the unknown consequences.
Listening to Nello has reinforced to me that it is important to check facts and to question proposed ideas. If an article in a magazine claims that a system is bias, look to back this up with academic proof where possible.
Just because the same piece of code can be applied to predict things for completely different use cases, it doesn’t mean that it should be. It is important to consider what the harm could be in developing machine learning models. The pair of jeans that ‘follow me around the internet’ after I have viewed them on ASOS inevitably cause less harm than a system that analyses immigrants to determine if they are lying.
A system could analyse something better than random selection could, better than a human could, but ethically it could be greatly harmful, so shouldn’t be deployed.
It will be important for me to consider and test the quality of training data. In professional practice, I would assume that to select a dataset, it would largely help to be an expert of that subject matter, or to find people who are. There is a popular computer science term ‘garbage in, garbage out’. It is also important to check that using the data abides to privacy regulations.
There isn’t a cookie cutter solution to fix data ethics. Far from it, as for all the positive applications of AI technology, there can be negatives. A possible step towards a solution could be that organisations would have their models reviewed by an internal or external body of experts that would throughly investigate the ethical concerns of any AI technology that was to be deployed. Counteractively, there is the fear that this could stifle innovation.
From the perspective of this blog, for people like me that are just getting started: I feel that obviously technical skills are important, but actively educating yourself about the legality of what you are building and assessing the harm it could cause will be vital for when you are at a stage of producing deployable applications. I am looking forward to understanding more about data ethics, which I believe will greatly influence how I approach my studies.