Episode 9: Consider Context – Show Notes
Algorithms do not operate in a vacuum. They operate in the context of a specific business, industry, problem, group of people, time frame, and more.
Algorithms impact people and processes in the course of their use. The charge of an ethical data scientist is to develop solutions which minimize the negative impact, maximize the positive, and achieve the objective. In doing this, we must consider context around the question we are trying to answer and the broader ramifications of the way we answer it.
Additional Links on Context in Data Science
Code Dependent: Pros and Cons of the Algorithm Age Pew Internet article describing many instances where the context in which algorithms operate have serious impacts to society.
Sensitivity and Specificity Wikipedia article discussing statistical power and test sensitivity. Fair warning – this is pretty technical.
Episode TranscriptView Episode Transcript
Marie: Welcome to the Data Science Ethics Podcast. I am Marie and I’m here with Lexy.
Today we are going to be talking about context. So Lexy, in terms of data science ethics, what defines context for an algorithm that somebody is designing?
Lexy: To my mind, context is really the implications or the setting in which our algorithm is going to be used. If you’re developing an algorithm, let’s say, for a marketing purpose versus a medical purpose, there’s a very different context. The implications of that algorithm are going to be different.
Marie, this is something that you can touched on in the last episode when we were talking about OKCupid and their test of anchoring bias. The difference between identifying a pair of shoes you might be interested in purchasing vs seeing how interested you might be in dating a specific individual. For most people, the choice of purchasing a pair of shoes is not a life-altering one. But the choice of dating a particular individual may be a life-altering one. It’s certainly some amount of time and effort that you’re expending on potentially chatting with this person and getting to know them, but it could be someone that you wind up with for quite a long time and that has larger ramifications on your life.
More broadly, context is the amount of influence that a given algorithm will have on people in a larger setting and the number of people that it will touch.
Marie: Influence is another good way to think about it. The other word that keeps coming to mind, for me, is impact.
Lexy: Absolutely. In statistics, we think about addressing the amount of impact, or addressing the sensitivity that you would have in an algorithm, on something called statistical power.
Marie: And so that might be similar to what other people think of in terms of statistical significance or how accurate a model needs to be right?
Lexy: Or confidence level that you’re not getting a false reading. We’ve talked a little bit about false positives and false negatives. Power, technically, is finding true positives.
For example, if you were looking at a medical test, you want it most likely to be fairly accurate, especially if it’s something where the risks if you were to pursue a particular treatment or to address it in a certain way you want to make sure that those risks are only ever going to be a possibility for people to whom the benefits will be greater. And so in a statistical test in a medical experiment, you want to get to a very high power test because you want to make sure that you’re only ever going to give this treatment to someone who truly needs it.
As an example, you might think about chemotherapy for cancer patients. Chemotherapy has some very severe impacts on a person’s body and you wouldn’t want to unduly put someone through that if they didn’t actually have cancer. Making sure that the test that you used to identify cancers in an individual is highly accurate, highly powerful is very important to making sure that you’re not going to give them more risks to their health than benefits that they would gain.
Marie: The medical analogy, I think, is helpful in understanding context because we’ve talked about how algorithms can sometimes work with other algorithms. So it could even be the type of thing, going back to the cancer example, where maybe the first test is more of a screening test to see if they are likely to have cancer before you put them into a test that really has a high power but might be more risky just the test itself. Then, once you’ve been able to do that, higher power tests were able to determine the right type of treatment. You want to make sure that you’ve had the right information along the way to determine that that treatment really is going to have the benefit outweigh the risks of it.
Lexy: Absolutely. The other factor to think about in sort of the lower power versus the higher power test is often the lower power tests are used on a broader group of people. Almost anyone who goes to a general practitioner might be getting those types of tests. If that’s the case, the breath of that particular test has a bigger context, more implications for broader audience. Versus the narrower set of those people who maybe already had that broad test who are now being exposed to the more invasive test or the more powerful test to determine if they have a specific type of cancer, or what that specific type of cancer might be susceptible to in terms of treatment, so that they could best find a treatment program. So the breath of impact also has a lot to do with the context of an algorithm.
In a lot of cases, the algorithms that we see on a daily basis have a very broad scope. Search algorithms, weather tracking patterns, traffic navigation – all of these things that, on a day-to-day basis in the modern world, we’re exposed to. Not only do they have to be precise so that, for example, it doesn’t tell you to make a wrong turn down a one-way street, it’s in a much more impactful context because so many people are using it.
Marie: So as somebody who is looking at their data science practice and the models that they’re creating, it sounds like the industry that they’re in helps provide context. The people that will be using the algorithm internally provide context. And the end users out in the world also provide context.
Lexy: There are multiple different layers. When you think about the way that the process has to be followed for data science, you get context all along the way. As a data scientist, you also have to be willing and able to consider the broader ramifications of the algorithm that you’re developing. You have to think about how it’s going to be used before coming up with the best solution, because there’s often more than one.
There are industries where compliance and regulation come into play very heavily. Where especially, for example in finance, you have to have a thorough documentation of everything your algorithm is doing, every type of data that has been utilized within it. And you have to make sure that it is not going to unduly penalize one group over another.
We have a number of other areas where that occurs in the States such as in health care. Obviously, medical reviews, medical testing – there’s a tremendous amount of rigor that’s placed on any sort of medical test. There’s a lot of oversight. For example, if you’re doing a drug test it would be the FDA and so forth.
Other areas that would have that kind of requirement include education. You cannot use certain types of information in an algorithm being developed because, again, they don’t want it to unduly bias. And then you also have to be cautious about proxies. We’ve talked a little bit about proxies in the past. Proxies are where you’re not necessarily putting in, for example, the race of a person but by including some other factor that hints at the race of that person. So you have to be cautious as to how you utilized that data.
There’s regulation to try and help maintain the context and make sure that you’re supervised in how you’re building an algorithm. That’s really an enforcement of context.
Even an unregulated industry, it helps to think about what will happen downstream from when your algorithm is being used versus how you’re creating it. What you might do differently to actually create something that’s more sustainable in the wild?
Marie: That leads into another piece of context which is time. As you develop algorithms today – five months down the line, a year down the line, things could change. Regulations could change. The business could change. And that could change the context of the algorithm in ways that you might want to change it.
Lexy: Yeah. Even other areas of a business can impact it. So, for example, if you look at finance at an algorithm that identifies good candidates for a credit card. If that credit card is no longer offered or if there are other new offerings out there, that may no longer be the best candidate for that particular product. You may want to convert them to a different product.
There are other times where the fact that time has passed means that the data has changed. For example, if we built an algorithm back in 2008 in the financial industry, it probably would have no bearing on how things work today in 2018. At the time, the financial world was collapsing as mortgages collapsed. Now we’re in a much different position economically and so the data would be completely different underlying that algorithm. This goes back to the care and feeding process we talked about within the data science process. Making sure that you follow through and follow up with your algorithm.
Well, I think that gives people some additional points to think about when they consider the context of an algorithm that they’re working on. This has been the Data Science Ethics Podcast with Marie and Lexy.
Lexy: Thanks so much for joining us. Catch you next time.
Lexy: I hope you’ve enjoyed listening to this episode of the Data Science Ethics Podcast. If you have please, like and subscribed via your favorite podcast app. Join in the conversation at datascienceethics.com, or on Facebook and Twitter at @DSEthics where we’re discussing model behavior. See you next time.