Podcast: Play in new window | Download
Data Science Ethics Podcast – Episode 1 Show Notes
As a starting point, we’re laying some groundwork. In this first informational episode, we talk about algorithms – what they are, what they do, and why they’re important to data science ethics.
Algorithms perform a set of steps on inputs to get to an output. In data science, we commonly use algorithms to predict what is likely to occur. For instance, how likely someone is to default on a loan or what the weather will be at 5 p.m. today. Other algorithms rate or rank things, like in a search engine ranking or a product recommendation engine.
Part of the reason to develop algorithms in the first place is to be able to make decisions consistently many times over. They form a crucial part of systems we all rely on to be accurate, fair, and fit for purpose. If the algorithm is flawed or biased, it instills that flaw or bias in every decision that it makes, every time. That’s why data scientists must be cautious in creating algorithms that are powerful yet ethical.
Additional Links on Algorithms
Most algorithm articles and resources out there speak about computer science more than data science. These links provide some good lists of common data science algorithms and their use. Incidentally, the websites that these come from are some of my favorites for general data science information. They often have articles with code snippets to help solve common business problems with analytics.
12 Algorithms Every Data Scientists Should Know – Data Science Central
A Tour of the Top 10 Algorithms for Machine Learning Newbies – Towards Data Science
Follow us on Facebook or Twitter as @DSEthics and be sure to like and subscribe on your favorite podcast app!
Episode TranscriptView Episode Transcript
Today we’re going to talk a bit about algorithms. Specifically, we’ll define what algorithms are; we’ll talk about what types of algorithms there are; we’ll look at some opportunities for where you may have seen algorithms in your travels; and then we’ll dig a little bit more into what algorithms have to do with ethics in the larger setting of data science ethics.
An algorithm is a function or a formula, that’s generally very complex, and it’s programmed for a computer to perform. Algorithms look kind of like a formula you may have had in school – like an equation where you have inputs on one side and you have an output on the other side. It doesn’t always work this way. Sometimes algorithms look like a very large case statement. They are, in general, a number of different functions that happen by a computer to create an outcome based on some amount of data that was put into a system. The algorithm is all those steps that go on in between the inputs and the output.
Data scientists often classify the types of algorithms there are based on how statistics divides them. So you may hear terms like a “regression” or a “classification” or “clustering” or an “anomaly detection” or an “association pattern”… something like that. The more modern techniques are things like deep learning using neural networks or reinforcement learning or natural language processing. These are all kinds of algorithms that represent different functions that happen to input data. But that’s not how you would see them in your normal life. So let’s talk a little bit about the kinds of algorithms that you use everyday.
One algorithm that many of us are very familiar with is a credit score. A credit score is a classification algorithm. In this case, we’re predicting a “yes” or a “no” as to whether or not someone would be likely to default on their next loan or their next credit card. This type of algorithm creates a probability which is saying how likely is it that this person would default. The more likely the person is to default, the lower their credit score. And so that probability then turns into the score that you see.
Another very common type of algorithm is a search engine ranking. If you use Google, Bing, or any other search engine, you’ve used this, even if it was behind the scenes. A search engine ranking algorithm identifies the most relevant pages to your contextual search. I say contextual because there are a number of things that come into play. Every time you enter a new phrase into the search engine, it has to interpret the context.
So as an example, if I were to search for “husky pictures”, I would be expecting to see images of really adorable dogs. However, there are other types of huskies that I might see. I might see the UConn Huskies. I might see Husky brand. Search engines have to know that the context in which I’m searching is for a dog as opposed to one of these other options. So it’s not just that the pages have the keywords that I’ve used but that the context is properly interpreted by the algorithm. That tends to involve a number of different actual functions that comprise the total process – the total algorithm – that is a search engine ranking.
Another very common algorithm is a weather forecast. If you look at your phone every morning and check to see if you need to bring an umbrella, you can thank an algorithm for that. A weather forecast is a time series analysis. They’re also called forecasts, so that helps. The time series analysis looks to see what’s likely to happen at any given point in the future. The weather forecast uses information that we’ve seen from prior weather patterns, as well as current conditions, and blends them to identify what is likely to occur over the next several days. Beyond that, the forecast gets very uncertain and so we don’t necessarily project far into the future.
So why bother worrying about the ethics of an algorithm? Well, an algorithm, as I mentioned before, is a computer program. Most often a computer program does the same type of thing over and over again. And so it’s important that when it does whatever that thing is that it does it fairly, it does it accurately, and it does it in a way that considers the context in which it’s going to be used.
As an example, let’s think back about the weather forecast. What if a meteorologist were developing a completely new forecasting algorithm – one that would at least ninety percent of the time be accurate? But the way that they figured that out was that they measured only how many days were sunny. And they said, “well, ninety percent of the time it’s sunny. So if I always predict that it will be sunny, ninety percent of the time I’ll get it right.”
Ninety percent accuracy can be seen as very high in some situations, but with weather, we want more precision. We want a more accurate forecast. If ten percent of the time you went outside, you didn’t have an umbrella because you thought it was going to be sunny and it started raining, you’d probably be pretty upset… as well as wet.
But what if it were even worse? Anomalies like hurricanes, for instance, are crucial to know in advance. Those types of anomalies are exactly what those forecasts should have in them. But if ninety percent of the time they don’t show up and, for whatever reason, the meteorologist didn’t bother to predict them, then that one percent or less of the time that we have a hurricane it would be disastrous. No one would be prepared. So it’s important to understand the context in which your algorithm is going to operate. In this case, helping millions of people to be prepared for the weather ahead.
Equally, if we think back to the credit score, if we just say “no one can have access to credit”, we’re taking away an important safety net from a lot of people. Or if we say that everyone gets credit, then the banks are no longer going to be profitable because there are going to be too many people who are defaulting on their credit.
Algorithms require data scientist to balance precision and accuracy with unbiasedness and context and that ethical consideration of how this algorithm is going to impact the world. That’s why it’s so important to understand algorithms as a part of data science ethics.
Thank you so much for joining us today. I hope you’ve enjoyed this episode. If you have please like and subscribe. You can find the data science ethics podcast on iTunes, Stitcher, PocketCasts, or wherever you get your favorite podcasts. You can also find us on datascienceethics.com and join in the conversation there or on Facebook and Twitter at @DSEthics. See you next time.
This podcast is copyright Alexis Kassan. All rights reserved. Music for this podcast is by DJ Shahmoney. Find him on Soundcloud or YouTube as DJShahMoneyBeatz.
From an ethics perspective, what is the point of overreach in scoring the conduct of an individual? This podcast mentions credit scoring, which is limited to the financial behavior of consumers. However the Chinese government is implementing a “social” credit score that tracks non-financial activity and assigns a value to each citizen. A low score can preclude a person from purchasing a flight, buying property or enrolling their children in private school. I would proffer the opinion that such an algorithm is a breach of data science ethics, yet will likely continue to propagate in the information age.
Great point! As we alluded to in the show, we will be covering the Sesame Credit scoring system in China soon. One of the fascinating aspects of ethics is that its tenets do not apply universally. What we, in the US, would deem unethical may not be seen as problematic elsewhere in the world. Conversely, some of the ethics around data and its use that we find acceptable are prohibited in other countries. Stay tuned!
[…] of showing false compatibility rates to users. The experiment was designed to test whether their algorithm was truly generating more meaningful conversations. There were two stages of the experiment. First, […]
[…] Marie: So from a data science ethics perspective, do you have a sense of when this reputational management may have started to be incorporated into some of these algorithms? […]
[…] that are really a minority of what’s going to happen in terms of the daily operation of these algorithms, is where the moral questions are. How do you assess the situation, what is the best thing to do? […]
I thought the brainwaves of human consciousness could not be put into an algorithm. When I search, Wikipedia denotes they’re are five brainwaves: Brainwave entrainment, This Is How Brain Waves Contribute To The State of Mind, Beta, Alpha, Theta, Delta, Gamma. Can you explain how algorithms can augment human consciousness using these brainwaves?
To be honest, I do not know. This is well outside of my field of research. That said, “augmenting human consciousness” would point to some sort of extrinsic influence. Barring implanting processors into the human body (not ready to tackle the ethics there quite yet), I do wonder if the so-called “Mozart Effect” might be a means of achieving the sort of augmentation to which you are alluding. While the increased effect of playing Mozart’s music versus someone else’s is up for debate, the concept of influencing human capabilities with patterns of sound has been somewhat proven. Perhaps some day… Read more »
Thank you, Lexy.
I would add this to the Mozart Effect: (sound with finely tuned chemistry),
Other than why human physical senses (alone) do not apprehend the ultimate constituents which all material objects (including human atomic and subatomic structure) are made from, it seems to me human consciousness (and brainwave construct) is the secondary concern for any developments and constraints of Super Intelligent Cognitive Machines…
Artificial Intelligence and National Security: The Importance of the AI Ecosystem