US GDPR Coming Soon?

Image Copyright: Maksim Kabakou

Episode 25 – US GDPR Show Notes

Data scientists in many industries rely on personal data to develop models and make decisions. They can gather data from third parties, the internet, internal sources, and more to amass a stockpile of information on which to base algorithms. That could change if the government implements a US GDPR variant.

Today we discuss a recent report from the Government Accountability Office explaining why now is a good time for a US GDPR.

Additional Links on US GDPR Recommendations

Congress Oversight Body Recommends GDPR-Style Privacy Laws Engadget article summarizing findings and reasons for the Government Accountability Office’s recommendations

Internet Privacy: Additional Federal Authority could Enhance Consumer Protection and Provide Flexibility Full GAO report on internet privacy and recommendations for legislation and accountability

View full transcript

Welcome to the Data Science Ethics Podcast. My name is Lexy and I’m your host. This podcast is free and independent thanks to member contributions. You can help by signing up to support us at datascienceethics.com. For just $5 per month, you’ll get access to the members only podcast, Data Science Ethics in Pop Culture. At the $10 per month level, you will also be able to attend live chats and debates with Marie and I. Plus you’ll be helping us to deliver more and better content. Now on with the show.

Lexy:                Welcome to the Data Science Ethics Podcast. This is Lexy Kassan

Marie:              And Marie Weber

Lexy:                And today we’re going to talk about the Engadget article that summarizes a new report coming out from the government accountability office stating that the US should consider some Gdpr like privacy laws. This report came out mid February. It was in response to a number of industry professionals as well as US senators pushing for review of our privacy regulations here in the United States. The government accountability office came out with a report stating that the FTC should be in charge of controlling data privacy. In the U S and that it’s power should be expanded to encompass a set of Gdpr like rules that would give them the ability to apply penalties as appropriate for violations. They cited Cambridge Analytica, the big Facebook scandal as a big reason for that. They also were talking about the rise in popularity of Internet of things devices and other causes for concern around connected cars, data sharing. A lot of other areas

Marie:              for those who are listening who maybe don’t know that much about Gdpr. That stands for the general data protection regulation that was implemented in the European Union. There have also been some people looking at this regulation in the US obviously is we’re we’re talking about in this article, but also California has passed their own version of Gdpr, which is supposed to go into effect in January of 2020 so it has started to have some conversation here in the states especially because also companies here in the United States that would have customers in the European Union are also covered under Gdpr. So there has been some move towards this regulation. This just makes it so the u s can also regulate it within the United States and it’s not just being handled through this EU regulation.

Lexy:                To be clear, no legislation has actually been put through at the federal level. They’re just starting to talk about it. Right. This is just early discussion points. Yep, so what’s interesting to me about this is we as data scientists collect a tremendous amount of data, some of it from third parties, some of it from devices and sensors and so forth to use in our algorithm development and in our decisions in the course of a day and what is going to happen in the business, what are we going to do with all of the information? Laws like this can really restrict what we’re able to access, but is that sufficient trade off in the EU? They’ve said privacy is more important. Figure out the algorithms downstream. In the US we’ve said get the data, share the data, increase usage, have it available and we’ll kind of consider the privacy part. Not necessarily second but later. Now we’re really starting to see that push. One of the things that makes me think about is a prior episode of this podcast where we were talking about the concerns that people have around AI in America specifically.

Marie:              So one of the things that we talked about in that episode was that people that are more involved in the data science community and the data science profession are more comfortable with Ai. Then for example, people that are less involved with it. So that type of dichotomy I think is part of the reason why we’re starting to see these discussions come up in the federal government about how to regulate this because there is concern among the population about how data is being used and instead of it being a case where as a profession people are just collecting data and figuring out how to use it in the business and then considering the privacy policies down the road, it’s going to really push that privacy policy more upstream in terms of how we develop our process and how we do our marketing and data collection and things like that.

Lexy:                The other thing it makes me think about in that same article and that same discussion was that people were unsure of where their data is, how it’s collected, where it goes, how it’s shared, all of that. It’s sort of a bridge here to what the Gao was talking about in that even companies that are app developers or that have an application or that are car manufacturers that have connected cars and so forth, they’re gathering location information for example, and they’re selling that data there, there streaming that data to other organizations so that other organizations can then use that data to target you at a given time where you’re in particular location. Most people aren’t aware of these types of things and so as they’re faced with, hey, you know, are you concerned about the fact that your location data is being shared everywhere? They’re often saying, well, yeah, I didn’t even know that was happening. Why is this going on? Why don’t I have a say in where my data goes? And so I think that this report and these articles are really starting to surface those concerns and to put some thought around what should we do with this? What should people know or what should they be aware of before data gets shared rather than after the fact.

Marie:              And part of it is going to be educating consumers and part of it is also going to be figuring out what really is needed. And so that is something that has been included in the the EU regulations where in the EU now it’s basically a situation where if a company no longer needs a client’s information, then they should have a plan for how long they keep it, why they keep it, and when they no longer have records of that, or if somebody is using their service and they want to move to a competitor, they should have the ability to take their data and move it to the other service provider. So they’re not locked in just because of who controls their data. So it’s very interesting to think about people having more control over their data and being able to say who can still hold onto it or use it and giving them more agency. It also means educating customers more and there’s a lot of new products and services being developed, so that means a lot more education about how their data maybe use or may affect the service of a product. So maybe you get one experience if you don’t allow certain information to be shared or used, but you get a different user experience if you do allow your information to be tracked and used and maybe it provides a better user experience, but you’re giving up some of your privacy in order to make that happen.

Lexy:                The part that I find trickiest and all of this, and I think that some of this report is really talking to this, is that there’s a lot of data that’s shared between companies that happens completely behind the scenes that a customer would never really see. So for example, they talked about connected cars where automakers don’t always clarify their data sharing practices so that if they’re indicating to another company that a vehicle was driven to a certain or that that customer is near a location, they might be sharing that with an organization completely unbeknownst to the consumer. This is actually happened and often it happens now with location based services on platforms like Google or apple on your mobile device where someone has purchased advertising and Google sends an ad to you on your android device for example, when you’re near their location. This happens to me every time I go around the corner and I’m right by the Joanne fabrics.

Lexy:                I get a location paying, basically saying, hey, there’s a deal. It’s not that I opted in to receive Joanne fabrics messaging. It’s that I have an android device. It’s sharing my location. Joanne fabrics has decided to utilize location based services, so I then get nad [inaudible] completely behind the scenes. Now in the case of Gdpr, we’ve already seen organizations that are deemed not to have essentially a need to know. There’s no legitimate interest from the consumer in having that company have their data. This happens often with third party providers, so as an example, I’ve used a number of different providers over the course of my career to get appended data enrichment data about customers, so we have a customer database. We want to know now what’s the age of these customers. For example, we might reach out to another provider who has ages. They match up the people and they tell us who is what age.

Lexy:                There’s a tremendous amount more data than that. I’m not going to go into all of it, but those companies are really getting dinged on Gdpr now because the consumer never specifically gave their information to that company. They never signed up for a service from that company. They signed up for a service from someone else or their data was gathered as part of the census or what have you, and these companies have just aggregated it together and are then selling it to other organizations. It’s a tremendous business here in the states. It has been a business all over the world. It’s now becoming less of a business in the EU because of Gdpr and so some of those techniques for gathering and enriching data are really going to get penalized. I think if we do run through a similar type of legislation here in the u s

Marie:              this article is point out that this conversation is just starting, so I’m sure that as it goes through the process, they’re going to be people from marketing fields and companies and the data science field explaining why having access to this data is a benefit. But when we think about the idea of holding your information private and not being tracked by all sorts of different companies, because yes, you might have their app installed on your phone, but that doesn’t necessarily mean that you wanted to give them everything. I think that’s going to become a very big discussion about where and how things should be regulated and honestly it’s probably going to give good guidelines and guardrails to how products are developed in the future.

Lexy:                Absolutely, and to how algorithms and models are developed in the future. There are a lot of considerations when you start dealing with the data from the EU as a data scientist, as to how you need to tag your models because if you’ve included anybody’s information from the EU, they have what’s called the right to be forgotten built into their Gdpr laws, which means that if they contact the company and say, I want all of my information, he can out of your system, delete me from your system. I don’t want you to have my data anymore. You have to take out not only all their data, but you have to remove them from, for example, a training population that you’ve built an algorithm on and then retrain your model or do additional work arounds to make sure that that information comes out and that you haven’t learned something about them in processing even if their data is no longer there.

Lexy:                Things like that I think over time are going to start to become more common practice, but it’s a very new regulation. This just came in less than a year ago into the EU. We in the U s have not had to deal with it as much unless you’re in a very big multinational organization. A lot of companies here are very reluctant to have a similar type of privacy law because there’s a lot of change that’s going to need to happen. I think that the lobbyists are going to have their work cut out for them. As this continues to go through discussions, I will leave them to it. Yeah,

Marie:              yeah. This is going to be another one of those where we’re not going to be able to answer this question on the podcast, but we’re happy to share it with you so and we will certainly be following it closely. Absolutely. So that has been another episode of the Data Science Ethics podcasts. Thank you so much for joining us. We’ll catch you next time. Bye.

We hope you’ve enjoyed listening to this episode of the Data Science Ethics podcast. If you have, please like and subscribe via your favorite podcast App. Also, please consider supporting us for just $5 per month. You can help us deliver more and better content.

Join in the conversation at datascienceethics.com, or on Facebook and Twitter at @DSEthics where we’re discussing model behavior. See you next time.

This podcast is copyright Alexis Kassan. All rights reserved. Music for this podcast is by DJ Shahmoney. Find him on Soundcloud or YouTube as DJShahMoneyBeatz.