Podcast: Play in new window | Download
Subscribe: RSS
Show Notes on Incorporate Inclusivity
Data scientists develop algorithms that have broad reach across the population. Chances are that the data science team building these widely-impactful models are not, themselves, large enough to represent so big a swath of the population. How can a small, likely less-diverse team acquire the wisdom of many?
In this episode, we discuss the need to incorporate inclusivity, seek outside perspectives, and collaborate with others to achieve a fairer result with fewer unforeseen negative ramifications.
Additional Links on Incorporate Inclusivity
Best Practices for Fostering Diversity & Inclusion in Data Science Berkeley Institute for Data Science article with a fascinating discussion of the differences between diversity, inclusivity, and equity. (insider info – we’ll be covering equity soon!)
Brandeis and Hugo discuss people of color and under-represented groups in data science. DataFramed podcast episode from DataCamp regarding inclusivity in data science.
Addressing the imminent need for diversity in data science SiliconAngle dive into the need for a more inclusive and diverse data science practice. While this does focus more on employment and hiring rather than input, there are some good stats on how this is seen in the industry.
Incorporate Inclusivity Episode Transcript
View full episode transcriptWelcome to the Data Science Ethics Podcast. My name is Lexy and I’m your host. This podcast is free and independent thanks to member contributions. You can help by signing up to support us at datascienceethics.com. For just $5 per month, you’ll get access to the members only podcast, Data Science Ethics in Pop Culture. At the $10 per month level, you will also be able to attend live chats and debates with Marie and I. Plus you’ll be helping us to deliver more and better content. Now on with the show.
Marie: Hello everybody and welcome to the Data Science Ethics Podcast. This is Marie Weber.
Lexy: And Lexy Kassan.
Marie: And today we are going to talk about the concept of incorporate inclusivity. So whenever you’re putting together a team, obviously the more diverse the team you can work with, the better the ideas that you can generate. And so how do we apply that to data science ethics? Lexy?
Lexy: There are a couple of ways. One is obviously get as diverse a team as you can, and I, I say that as you can. Sometimes we’re constrained. It happens very true in STEM fields for a long time now it’s been dominated by primarily men. There’s definitely racial gaps in there and a lot of teams simply are not big enough to have a wide spectrum of backgrounds and races and so forth, so do as much as you can within your team to try to get different perspectives. The other part of that is to talk with a broader audience, even if it’s not part of the team developing the algorithm. Make sure that if you can possibly get to the end users of your algorithm, create a diversified pool of people that you can talk with, either as a survey or a focus group or something along those lines, something where prior to your algorithm you can discuss some of the intentions and the intended outcomes with that group because you may find that there are different perspectives that they bring to how something would influence them or how it might impact their life versus what you’re thinking of.
Lexy: We had an episode on the Google gorilla problem. There are a number of other times when a very similar issue has occurred because algorithms were trained on too narrow a set of data. And the ramifications for those types of models can be very severe and very public. And so getting more perspective, more data, a wider set of data is generally a good thing. And making sure that that data is representative of the people and then the people give you feedback, especially when dealing with something that’s very public. And that’s really what it is that we’re trying to do. We want to make sure that we’re inclusive of all races, all creeds, all genders, all sexualities, whatever it may be. Because the way that the algorithm works in practice on all of these different types of individuals can be very different. And the terminology in it can be different the way that it’s used. So make sure that you have as broad a representation of the user base as possible so that you know how it’s going to affect them.
Marie: And I think that’s really interesting cause you’ve touched on a few different areas about how to incorporate inclusivity. So just to recap, it’s who’s actually working on the model and on the team then who is basically being tapped to provide input into the model and then also who you are testing the model with as another way to get feedback.
Lexy: Absolutely, and then the other thing that I would say with all of that is we use something within the organization that that I’m in called StrengthsFinder, which is less about maybe some of these protected classes and more about intrinsic strengths of a given person and the way that they think and the way that they interact. We talk about creating diversified teams amongst strengths as well so that you have a broader perspective of the ways that people think about things, not always just how they interact with something. So I would take it that one step further as well as you can, to be able to incorporate people who maybe even even if it’s within your organization who maybe are not data scientists, but simply come from other areas of the organization that bring a different perspective. It helps to see those different perspectives and how they would interpret your model. And how they would use your model because some of those may act adversarial in a way, you know, again, not, not intentionally adversarial. They may represent a usage of that algorithm in an unanticipated way that acts more like an adversary to you.
Marie: Another way to think about that is by having more people test your algorithm, you can actually see how people would use it in the wild. And so you can envision how people will use, let’s say for example a website. This goes back to kind of my background in marketing where yield design a website and then you do user testing to see how people actually use it. And even though you might have used best practices for the navigation and how you set up the content, if you find that people are getting stuck in certain areas, that gives you feedback and saying, okay, maybe I need to be more clear here. Maybe instead of having this being a paragraph, I can replace it with some icons and fewer texts and people get the message faster and clear. So that type of real world user testing is very important. And that’s how a lot of sites have been able to evolve over time is because of user testing. So the same thing can happen with a model that you’re building.
Lexy: One of the keys, the way that I would think about incorporating more people into this and to really get true feedback, and this is going to sound a little bit odd at first, but bear with me, is to give less information at the start and see where they take it. So for example, if I had said, I’m building an algorithm so that we can send more focused emails to people, what does that say to, not you necessarily Marie because you work in this space, but to somebody else. And so they may take that as, Oh, you’re just going to bombard people with messages, or they’re going to see the same thing over and over, and that’s a horrible user experience. Or they could say, that sounds like a great idea. You know, maybe we’ll get less emails that are irrelevant. Perfect.
Lexy: If I were to say, I’m going to build a credit score that takes into account the number of bedrooms in someone’s house as part of their willingness to pay, well, somebody might come back to me if I wasn’t necessarily thinking of it this way and say, well, that’s all well and good, but what happens if they’re in a duplex? What happens if they’re living in a house that they don’t own? What happens if they have a an open floor plan? What happens if they have three bedrooms but they also have a guest house? Who knows, but there’s a bazillion scenarios out there.
Marie: Yeah, housing is definitely not standardized. Even though you might have a picture of what a house means to you in your head, the picture that somebody else has in their head can be completely different.
Lexy: Exactly. If you say to them something that’s fairly generic, they might give you back all kinds of ideas for things that you need to incorporate into your algorithm or reconsider about your algorithm because you simply aren’t thinking of it from the same perspective and those perspectives are valuable. They’re tremendously helpful in getting a broader view and so give less information at the start and then as you train, we’ve talked about training transparently, so be more open about what you’re including, why you’re including it, make sure that you’re indicating that you’ve brought in this feedback and that you’ve made adjustments accordingly and be transparent then about what you’ve done with it because then it may be more trusted. You may find that people acknowledge, yes, this does appear to be a valid means of going down this path. Maybe bedrooms is the way to go. It’s not necessarily the home value it, but there are definitely ways that it gives them an insight then into why you’ve made the decisions that you’ve made.
Lexy: In data science, everything is about decisions. We make decisions at every step within the data science process and in each step you need to be able to explain and defend what you’ve done and being able to defend it when you haven’t really considered all of the aspects of it is very difficult. So gather more feedback, get more perspectives, be able to see it from different angles, and then you can defend what you’ve done better.
Marie: And I think it’s really interesting what you said about exposing people to your model without finding for them what the model is trying to do so you can get kind of their unbiased feedback. I’ll tie this into also what you mentioned about StrengthFinders because StrengthFinders is actually put out by the Gallup organization and one of the first places that I worked after college was the Gallup organization. So, I actually was introduced to the StrengthFinders there. And so the concept of having people have different strengths was something that I had kind of been exposed to through different types of personality testing before by StrengthFinders is a great one to explore, to understand your own strengths and the strengths of people on your team and how to build a more diverse team.
Marie: Because Gallup is so known for their surveying that they do – one of the approaches that you take when you do a survey is not influencing the person that you’re interviewing. You want to get their unbiased opinion. So the way that you construct the questions, the way that you even construct the available answers that you might want them to give is something that really is a methodology in and of itself. What you’re looking for is to make sure that the questions aren’t leading to make sure that the questions are unbiased and one of the ways that you do that is by not defining things upfront and by making them more open-ended questions or very structured questions. So the idea that you can apply that to your data science when you’re getting feedback is a really key concept for people to be able to apply to their career.
Lexy: Absolutely. We talked about this a little bit in the collect carefully episode where we were discussing the need to be cautious. If you’re special, you’re doing surveys or focus groups or any sort of first party research in that research and survey design and all of those are a science onto themselves and you have to understand how something was asked in order to understand how it was answered. In this case, we’re hoping to get more open form answers, innovations or absolutely. Observations or whatever it is that you’re gathering back, that you want to be able to get all of that information from them. And so leave it broad, leave it open for them to interpret and then refine after that.
Marie: So hopefully we’ve given you a lot to think about in terms of incorporating inclusivity into your data science process. This was Marie Weber.
Lexy: and Lexy Kassan.
Marie: Thanks so much.
We hope you’ve enjoyed listening to this episode of the Data Science Ethics podcast. If you have, please like and subscribe via your favorite podcast App. Also, please consider supporting us for just $5 per month. You can help us deliver more and better content.
Join in the conversation at datascienceethics.com, or on Facebook and Twitter at @DSEthics where we’re discussing model behavior. See you next time.
This podcast is copyright Alexis Kassan. All rights reserved. Music for this podcast is by DJ Shahmoney. Find him on Soundcloud or YouTube as DJShahMoneyBeatz.