Rosa Emerald Fox

Technology, career and lifestyle blogger

Month: May 2019

Living with Intelligent Machines

Today I was incredibly fortune to see Nello Cristianini present on the subject of ‘Living with Intelligent Machines’ at Government Digital Sevice. Nello is a lecturer at Bristol University and has been studying the field of Artificial Intelligence for over 25 years.

The talk focused on ethical considerations surrounding data, AI and machine learning.

In this post I will write about some of my key take aways. Please keep in mind these are my own understandings and interpretations of the presentation. I have used a mixture of Nello’s examples along with those that I have found myself.

The Business of Data 

‘AI’ and ‘Big Data’ have been thrown around as marketing buzzwords over the last few years in order to sell products. Cisco have gone so far as to create promotional images displaying the quote ‘Data is the new oil.

Clearly there is money in data. I am not going to search for stats on how many people use Google search engine, but it’s obviously a lot of people. From what I can gather, Google don’t sell this data, but they do use it to power their own advertising platform which other companies pay to use. Advertisers trust that their ads will be targeting users that are likely to be interested in whatever they are selling.

Due to the perception of data = lots of cash, start ups inevitably start springing up, claiming to offer AI solutions to various problems. It could be far too easy for someone who doesn’t really know what they are doing, to hack together a basic model using some Python libraries, package it up as AI and sell it.

I am starting to learn how to build machine learning models myself, and yes for the most part I will be learning through hacking things together and seeing which results I get. When you are learning it is so important to do this; to experiment, to see what you can build and to practically explore the theory you read. It is equally important to remember that this is not something that should be deployed. By all means share your code as a learning exercise but don’t try and sell things that you don’t fully understand. Nello reiterated that quick solutions are worse than no solutions.

Aggregating data and presenting it to users online has disrupted many industries over the last two decades. Websites such as Compare the Market (comparing and buying insurance) and Skyscanner (finding/comparing/booking flights) save users from trawling through lots of different shops or websites to find the options that best suit us.

Airbnb, ASOS and online food shopping mean that we can choose to make purchases without physically needing to go anywhere. Google Maps helps us find where we need to go. Skype allows us to video call our loved ones from across the world.

These online services convenience our lives and are free so we use them. They have succeeded in removing an intermediate layer. We don’t want to go back to paying to speak to our relatives abroad via landline, but we don’t tend to question wether our Skype video calls are being used to train facial recognition systems (Microsoft President Brad Smith wrote a blog post in 2018 calling out the need for public regulation and corporate responsibility for facial recognition technology).

We have got so used to consuming this technology that it’s virtually impossible to go back. People have been unpleasantly surprised in how companies have used their data, which has resulted in backlash. In 2012 a 26 year old bar manager was refused entry into America because he tweeted “Free this week, for quick gossip/prep before I go and destroy America”. He meant it in a party sense, but it was interpreted as raising concern and he was sent back to the UK.

Nello showed us an example of Facebook banning the insurance company Admiral from pricing car insurance based on the content of users Facebook posts. Admiral claimed that they were experimenting with ways in which young drivers could prove their sensibility. The idea was that young drivers opted in to sharing their Facebook data and could save up to £350 if they appeared sensible. According to Admiral (…well actually the BBC news article I read about the project…) being sensible involves “writing in short concrete sentences, using lists, and arranging to meet friends at a set time and place, rather than just “tonight””.

Being assessed based on your online activity, throws a spanner in the works of the theory that open and transparent models will eliminate bias. If you act in a certain way to meet the criteria of a model then you can’t live as your authentic self. Nello gave an example of Uber drivers reporting that they felt they had to take abuse from customers in order to maintain a high rating to adhere to the scoring system. We don’t have an ID card scheme in the UK, but for countries that do, could various scores be added to represent people based on their data? Could they be denied opportunities because of a score that may not accurately represent how ‘good’ they are?

Making sense of the digital world

These issues are very complex. The 2016 introduction of GDPR helps through setting regulation in EU law about data protection and privacy, but as users, often we will just accept the cookies and the unknown consequences. 

Listening to Nello has reinforced to me that it is important to check facts and to question proposed ideas. If an article in a magazine claims that a system is bias, look to back this up with academic proof where possible.

Just because the same piece of code can be applied to predict things for completely different use cases, it doesn’t mean that it should be. It is important to consider what the harm could be in developing machine learning models. The pair of jeans that ‘follow me around the internet’ after I have viewed them on ASOS inevitably cause less harm than a system that analyses immigrants to determine if they are lying.

A system could analyse something better than random selection could, better than a human could, but ethically it could be greatly harmful, so shouldn’t be deployed.

It will be important for me to consider and test the quality of training data. In professional practice, I would assume that to select a dataset, it would largely help to be an expert of that subject matter, or to find people who are. There is a popular computer science term ‘garbage in, garbage out’. It is also important to check that using the data abides to privacy regulations.

There isn’t a cookie cutter solution to fix data ethics. Far from it, as for all the positive applications of AI technology, there can be negatives. A possible step towards a solution could be that organisations would have their models reviewed by an internal or external body of experts that would throughly investigate the ethical concerns of any AI technology that was to be deployed. Counteractively, there is the fear that this could stifle innovation.

From the perspective of this blog, for people like me that are just getting started: I feel that obviously technical skills are important, but actively educating yourself about the legality of what you are building and assessing the harm it could cause will be vital for when you are at a stage of producing deployable applications. I am looking forward to understanding more about data ethics, which I believe will greatly influence how I approach my studies.

AI and Machine Learning studies begin

I recently started on a learning program provided through work (Government Digital Service), to better understand emerging technologies, with a focus on Artificial Intelligence (AI) and Machine Learning.

AI and Machine Learning are huge subject areas and I am learning too fast to be able to write about it all. It can get deeply technical very quickly. I will aim to cover core concepts but realistically this blog will give more of an overview.

If you are interested in learning more then I will point you in the direction of the resources I have used, but essentially if you are interested in how to start scratching the surface in exploring these topics then you are in the right place 🙂

For 10 weeks I am studying for 2 days a week. The learning is fairly self directed, though once a week I meet with my brilliant mentor Ivan who is a lecturer in AI and Data Science at The University of Bristol. He is also a fellow at The Alan Turing Institute which is the national institute for data science and artificial intelligence and is based in the British Library.

First things first, a quick bibliography:

Algorithm

Think of an algorithm as a set of instructions… inputs that result in outputs… an example could be a cake recipe.

Artificial Intelligence

When a machine performs tasks that would usually require human brain power to accomplish.

Data Science

Turning data into useful information. The study of data science brings together researchers in computer science, mathematics, statistics, machine learning, engineering and the social sciences.

Machine Learning

A subset of Artificial Intelligence. It is based on writing computer algorithms (sets of instructions given to a computer) that can learn from information they have previously processed in order to generate an output.

Systems appear ‘intelligent’ because they can adapt to different situations based on what they have learnt/seen before.

I found the FAQ of the Alan Turing Institute gave a great introductions to these terms. Their website in general is great for gaining an understanding the different areas that AI encompasses.

Due to factors such as faster processing speeds and access to huge amounts of data, AI technologies are being implemented within a wide variety of areas. Last week I helped out on a stand at a popular recruitment event in London called Silicon Milk Roundabout. I spoke to different people for 2 hours non stop and over a third were looking for data science roles. I would be surprised if this was anywhere near as high even a year ago. The way people are thinking about how we develop technology feels like it is quickly shifting toward being much more data driven.

Example uses of AI and machine learning include:

  • The classic example: Fraud detection
  • Smart homes, where decisions can be made based upon factors such as energy consumption or perceived home safety
  • Connected and self driving cars
  • Sentiment analysis (for example analysing if a review is positive or negative)
  • Managing workloads of computer systems (Google Deepmind reduced energy used for their coding data centres by 40%)
  • Health care – helping doctors with diagnosis
  • Recruitment – sourcing candidates and interviews with chatbots
  • Predicting vulnerability exploitation in software
  • Financial market prediction 
  • Accounting and Fintech – automating data entry and reporting
  • Proposal review – reviewing contracts, cost, quality level
  • Voice assistants

Last week, I managed to catch a panel discussion on the subject ‘AI for Social Good’ at the AI and Big Data Expo. The Head of Programme, Digital Commission of the disability charity Scope gave some interesting examples of AI being used for social good.

She spoke about how in New York, screens that people can interact with using sign language are being trialled and installed on buses to improve accessibility. Microsoft are developing ‘Seeing AI’ used within a text recognition application designed with people that are blind. Really excitingly, The National Theatre and Accenture have developed Smart Caption Glasses. They are a way for people with hearing loss to see a transcript of the dialogue and descriptions of the sound from a performance displayed on the lenses of the glasses.

The panel also discussed how although the design focus of these AI applications may have been for people with specific disabilities, they will benefit many others. Somebody could be holding a baby and suddenly they wouldn’t be as mobile as they were before. It is a shame that a business argument to design accessibly would need to be made, but designing for people with specific needs shouldn’t be viewed as designing for a small subset of users as the benefits will cascade. Thus designing accessibly shouldn’t be an afterthought.

Of course it isn’t all good. Systems learn based on what they have seen before. Society is inherently bias against minority groups. If we let this (to name a few) racist, sexist, transphobic view of the world run through our systems and then rely on those systems to make predictions based on this information, then we are only going to amplify bias. The people developing the systems need to do so with this in mind. Machine learning models should be made transparent where possible.

There is growing concern that some job sectors will be replaced with AI. If you work in a job that involves solving lots of problems and a high level of human interaction then you will probably be less at risk. If you are a train driver, mortgage advisor or stock market trader then it is quite possible your job market could be affected one day.

Another concern is data privacy. As users, we want our data to be kept safe. It’s no secret that tech companies largely profit from selling on information. We take this as the trade off for not paying to use their platforms, but are not very confident in how our data will be used or what the limits of surveillance are. On the flip side, to train Machine Learning systems so that they can make accurate predictions, we want lots of data. This is ok if you work at a large corporation with a lot of access to user data, but if not there is a reliance on collecting it yourself of using open data sets.

In 2018 The Department of Culture, Media and Sport released The Data Ethics Framework. The framework sets out clear principles for how data should be used in the public sector. It will help us maximise the value of data whilst also setting the highest standards for transparency and accountability when building or buying new data technology. Many open data sets are available on https://data.gov.uk/.

Some examples of how public sector organisations are implementing emerging technologies, including AI and machine learning have been presented here by the Innovation Team at Government Digital Service. Projects range from anomalous ship detection to resource allocation of fire engines to predicting people in crisis. A visualisation of the research can be found here and can be filtered in various ways.

Emerging technologies present a lot of interesting technical challenges to the public sector which is greatly motivating to me. In terms of my studies, realistically there is only so much I can cover in 10 weeks (to put this in to perspective, a lot of the data scientists I have met at work have PHDs, in maths…) but I am interested to explore machine learning from the perspective of a software developer and seeing how much I can learn and practically implement (albeit it may rely on the help of some handy Python libraries that already implement the mathematical heavy lifting…).

This post has focused a lot on the application of the technologies rather than the technical implementation of machine learning, which is somewhat less glamorous and involves spending a lot of time cleaning up data and finding relationships between data points. I plan to write about both, so please keep reading if you are interested in following my journey and get in touch if you have any questions or resources.

 

Powered by WordPress & Theme by Anders Norén