Podcast: Play in new window | Download
Episode 27: Predicting Mortality – Show Notes
Death comes for us all. Even for kings he comes.Robert Bolt
Researchers in the UK have been able to vastly improve the capabilities to predict an early death for middle-aged patients. New algorithms were developed to be 76% accurate, even finding different indicators that were more predictive.
But as we get closer to the capability to tell people that they have less time to live, the question becomes – should we?
Additional Links on Predicting Mortality
Using Artificial Intelligence to Predict Mortality Medical News Today article with a high level overview of the UK study.
AI is Good (Perhaps Too Good) at Predicting Who Will Die Prematurely LiveScience story covering the UK study and describing more of the details around model comparison.
AI Could Predict Death. But What If the Algorithm Is Biased? Wired article outlining concerns over the implications of trusting the UK algorithm if the data collected may be skewed.
Predicting Mortality Episode TranscriptShow full episode transcript
Welcome to the Data Science Ethics Podcast. My name is Lexy and I’m your host. This podcast is free and independent thanks to member contributions. You can help by signing up to support us at datascienceethics.com. For just $5 per month, you’ll get access to the members only podcast, Data Science Ethics in Pop Culture. At the $10 per month level, you will also be able to attend live chats and debates with Marie and I. Plus you’ll be helping us to deliver more and better content. Now on with the show.
Marie: Hello everybody and welcome to the data science ethics podcasts. This is Marie Weber
Lexy: and Lexy Kassan
Marie: and today we are going to be talking about AI that is being used to predict mortality. So this is an article that Lexy actually found on life science and it’s talking about when an AI was developed based off of data that’s actually in the UK bio bank. And they were looking at people from the ages of 40 to 69 years old and the algorithm that they developed performed better than other techniques they had used before. So do you want to talk a little bit about those techniques and how they differ?
Lexy: Sure. There have been algorithms developed. They’ve been out there for a few decades now to try to predict, I’m going to say survival and in this case it’s literal survival, but in a lot of cases it’s when something will end, when is the likely end point of something. So for example, I’ve used this to predict a customer churning out of a contract. In this case it would be somebody passing away. The technique that was most commonly used was a Cox survival analysis or Cox regression model. That’s one of the algorithms that they compared in this study. The other ones they used or a machine learning technique called a random forest and then a deep learning technique that used a number of layers within a neural net that would then construct its own rules. And as we’ve talked about on this podcast before, neural networks and deep learning techniques are a little bit more black box and that you put in the layers that you want it to go through but you don’t necessarily know all of the rules as to what it came up with and they were specifically using these to look for who have these 500,000 people that they were studying were more likely to die prematurely from chronic diseases.
Lexy: The Cox model was about 44% accurate and tended to overpredict. The random forest was about 64% accurate and the deep learning model was about 76% the other part that it talked about in the article was that the factors that these different algorithms came up with as being important varied a bit, so all of them as their top factors used, age, gender, smoking history and a prior cancer diagnosis. But then it started to get a little bit varied. The Cox techniques used ethnicity and physical activity as their next most important factors, random forest used body fat percentage, waist circumference and skin tone, which while potentially linked to ethnicity and physical activity are a little bit more indicative of that individual’s characteristics as opposed to a more general characteristics of groups
Marie: and we can also say could be tied to health risks like the risk of developing skin cancer.
Lexy: Absolutely, which a prior cancer diagnosis was part of that and so think about different types of cancer that somebody could potentially get. That’s absolutely one of the factors. The deep learning model on the other hand saw job hazards, air pollution, alcohol consumption, and the taking of certain medications as its next highest factors. What fascinated me about this wasn’t so much that we could predict it as cool as that is, but I had to wonder, should we tell people this stuff
Marie: and what it makes me think of is I know personally that I’m somebody that’s interested in health and I’ve looked at different things like, okay, what are things that I can do to live longer? I once worked with somebody who said triple digits baby and they wanted to live to be a hundred years or older so you know, looking at things like, okay, your physical activity, what you eat, do you get enough movement during the day? Even like where you live. Those types of things can be factors for how long you live in life, so there, there are definitely people out there that are interested in this and who have an interest outside of it being part of their, their profession. I think for those people they’re very interested in something like that. It’s like, Oh, you can tell me that based on what I eat, you know I could potentially live longer and I think there are people who do take those steps. I think there are other people potentially that are like, nope, I enjoy what I eat. For example, I know people that have had doctors prescribed them a diet that is maybe lower insult or lower in fat and they’re like, nope, are lower in sugar. Right, exactly. They decided that, no, I’d rather continue to eat the things that are salty and bad for me because I don’t like it otherwise. Or they don’t put in the effort to to make the change, see how it helps them feel.
Lexy: So maybe I’m just cynical that the reason that I thought, should we tell people that they’re more likely to die prematurely and 76% for what it’s worth is a pretty good prediction in this case.
Marie: Yeah. So out of a hundred people, it could help give 76 people an opportunity to make a change that would help them live longer potentially. If all of it,
Lexy: those hundred people, we’re the ones predicted to die prematurely. Yes. Yeah. That said, the reason I think about it is, is it better to know or not to know? And this is one of those kind of paradoxical questions of what happens if you know you’re going to die tomorrow? Are you going to go out and do something horrible or are you gonna take that time to do something great? Or in this case, I mean obviously it’s not going to be tomorrow.
Marie: To clarify, this algorithm was predicting there more or less long term mortality. These are already people that were in their forties to sixties to sixties let’s say. We’re just taking somebody who’s in their sixties the average probably is 20 years that they have. So you’re trying to predict who is going to die earlier than that 20 years potentially, right? Yeah.
Lexy: So there is still a considerable amount of time that said, what do you do if you know, if you know that your history of smoking for the last 40 years has contributed to the fact that you’re very likely to die prematurely. Do you try to quit now or do you go to late guests? I’m just stuck. I’m going to die prematurely. I’m still gonna smoke and this to some degree, it makes me think of how people have acted in the face of a lot of medical evidence up to this point. I kind of wonder if we’re able to more specifically pinpoint that a given individual is very likely to die prematurely if it takes away that unrealistic optimism bias of, I know that smoking causes cancer and premature death, but that won’t happen to me even though I’m a smoker. That’s an optimism bias or unrealistic optimism bias. Does it take away that safety blanket and say, nope, it really is going to happen to you. You need to change your behavior. You specifically need to change your behavior.
Marie: I don’t know the intent of the people that developed this algorithm, but given that it’s was developed in the UK and they have a form of health insurance in the UK that’s more accessible to everyone, that they are trying to reach those people where they could have an impact and give them opportunities to understand the risks and that they might die prematurely and then hopefully be able to correct it if there are things within their control to to change if it’s air quality or work conditions that might need the institutions to step in and help address. That is not an something that an individual can address that easily. But the, the hope would be if I give you this information and I can show you that it can provide you with a different outcome, that you would take actions towards that better outcome for yourself.
Lexy: And when we talk about institutional changes here, these are things like regulations that would be put in place, changes to social programs for assistance. For example, for mobility, meaning for a geographic mobility so people could move out of areas that are problematic to their health, potentially having additional funding for things like smoking cessation programs or alcohol counseling or other types of things that are factors that these models have been using that were identified as major risk factors. It could be retraining for jobs, it could be all kinds of things. So those would be institutional changes that would come in.
Marie: Exactly. Another example would be even here in the US in the seventies I believe it was, there was a problem with more lead being in our environment because there was lead in gasoline. So there was a initiative to get unleaded gasoline and get that out of the environment.
Lexy: Exactly. So it was a regulation that was put in place that changed the way that we consumed the product and led to less led. The other one that comes to mind of course, is tobacco. Where over time we’ve seen larger, more prominent warnings on tobacco products. There’s been stricter regulation around who’s able to purchase tobacco products. There’s been a lot more taxation to make it economically less feasible for people to purchase tobacco products and consume tobacco products. Those are all institutional changes and things like that could be potentially affected. Here. It’s not a scenario tester. It doesn’t say if you change your job right now from one where you’re put into hazardous situations to one where you’re not put into hazardous situations, you will gain five years. It doesn’t give you that level of specificity. It doesn’t say if you move to the country and out of the air pollution right now, you’ll get another three years. None of that. All it says is based on everything that has happened till now. Here’s where you’re at. You’re likely or unlikely to die prematurely. So without that more scenario based information, it really leaves it to the physicians still to advise their patients to try to take action around their health and around the circumstances that are contributing to hazard to their health. But again, I feel like we’ve been down this road forever. Physicians have always been trying to tell people to take care of themselves and people have largely ignored it unlike you, Marie.
Marie: Or eat healthier or like you were saying smoke less or exercise more or do more to relieve stress.
Lexy: Or don’t drink alcohol. But when you talk about…
Marie: Get enough sleep.
Lexy: Hush you! That wasn’t in any of the models, I will have you note. There are some very large things that are much more systemic that these, that these models are pointing to like job hazards. Are you going to be able to immediately change your career path simply because a modeled hold you that your job is putting you at risk or air pollution? How much can you realistically affect the air pollution around you? Not very much. You would have to move in order to go get to a place that has a better air quality in order to avoid that risk. And again at that point, how much are people willing to do
Marie: or how much are people in different economic situations able to do? Very true. Like some people might not really have an option to pick up and move to the country, especially if they don’t have the the means to start a business out there or if they can’t find a job in those areas. So there are implications where the data could be saying, you know, because you worked in Xyz Industry for for 20 years maybe, and maybe you’re not even doing that anymore, but because you did that in your past and you can’t change your past. Yeah, you right now this is, this is going to potentially lead to premature death. Hopefully there would be things that they could recommend because the human body is always healing and rebuilding. So to the extent that you can give it a chance to heal or recover from some of those things, you know, hopefully doctors would be able to point people in that direction. But if there are some things that are baked in at that point, it goes back to your question, is it better for people to know or not to know?
Lexy: Yeah. The other thing, and this is a seriously shaky ground for ethics. The other thing is, is it a good thing for us to always try to increase life expectancy? Yeah,
Marie: that’s a good question because if you play it out, and I mean there, there are definitely people that are working towards this as a goal of increasing life expectancy. So more people can live to a hundred triple digits maybe or maybe 120 or some people are even starting to theorize that you could push human life expectancy two to 200 or even 300 probably not any of us that are in our professional careers right now, but there are some people that theorize that there are children being born today that could live to be well over a hundred plus. So the ideas of doing that at an entire population scale, if we can get the whole population so they’re not just having a life expectancy of 80 but a hundred how does that impact the whole system?
Lexy: There’s a lot of back and forth in it because if everyone’s living to a longer age and contributing to more overpopulation in certain areas and so forth, does that affect air pollution, which then affects premature death or does that affect other types? Or as we start to live longer and as we continue to automate a lot of processes, does it take people out of having hazardous jobs and now robots are doing those jobs, so now it’s less likely that they would die from a job hazard. All of these different things.
Marie: And I think there’s an important distinction between living longer and also having more productive years. And I think ultimately the goal will be not for people just to live longer, but how to increase the productive years that people have. So you don’t have extra burdens on society where everybody needs to be in like an assisted living facility starting at the time that they’re 70 and that lasts for 50 years until they’re 120 part of what we’re talking about here is not just related to the data science ethics, but also related to other layers. So when we talk about these institutional changes, that’s then society making a judgment call on saying that it’s important to help people live as long as possible and be as healthy as possible and making those changes and showing that that’s a value that you want to to move towards. That means, as we talk about this and we talk about anticipate an adversary’s, it’s also important to think about how this could potentially be used in the wrong hands because you’d want to make sure that if governments or organizations, we’re making decisions on this data, that they were making those decisions and ethical way.
Lexy: A couple of ways I think about anticipating adversaries here. One is at the patient level, meaning that if you tell somebody they’re likely to die prematurely, that they’re going to handle that in a responsible manner. They’re going to do more for their health. They’re going to get their affairs in order. If they don’t think that they’ll be able to kind of escape the algorithm, escape the, they’re premature death. The flip side to that though is the people who would say, well, I’m going to die anyway. Guess I’m gonna live it up however I want to live it up and do all kinds of irresponsible things. That is one form of adversarial behavior. Another would be having this algorithm in the wrong hands and not equally conveying information to patients. So for example, if we know this algorithm exists, but we only are able to talk to people who are wealthier, who are of a specific ethnic background and so forth, those are the people who are likely to then have the opportunity to take responsible actions and live longer versus being able to equally convey this to everyone in the population so that they could all benefit from knowing whether they’re at increased risk.
Lexy: We need to think about this from multiple different perspectives as to how it could be mishandled. The other thing that that points to is getting physicians more comfortable with algorithms like this, understanding them and being able to convey to their patients what the algorithm is telling them what their patients could or should be doing to take charge of their own health and try to extend their life expectancy. Right now, there’s not been a lot of training for physicians out in the field who have been practicing for maybe decades on new algorithms and new modeling techniques and on these new types of researches that are coming out.
Marie: It’s interesting to think about where this algorithm is right now and if you train it well, the assumption would be that it would get more accurate in its predictions. So right now today, if somebody looks at it and it’s a 76% chance that it’s going to be right, does that impacts amaze behavior as much as when the accuracy is maybe up to 95%.
Lexy: something to be said for a more accurate algorithm and and indicating to someone that you’re 95% confident that they’re going to die prematurely versus 76% confidence that they’re going to die prematurely. However, again, we have been down this road many times before in many different medical situations where we’ve conveyed to the population that certain substances, certain actions lead to outcomes that they don’t want and they still do them regardless. So how confident do you have to be for people to get past that optimism bias? Is there ever a number that you could tell somebody where they could not dismiss it and it would cause them to change their action? If you tell them you’re 100% confident that they will die prematurely, they’re going to say, oh, well I guess that’s that. It becomes dismissed. So does it matter how close you get to 100%?
Marie: I think it could potentially matter in terms of helping doctors communicate the value of the algorithm, especially with something like this and trying to communicate it to a larger population. If they can point to it as being more accurate, then I think that’s going to help certain segments of the population trust it more.
Lexy: One of the things I find really interesting about this algorithm is that most of the factors that it’s pointed to, our behavioral, very true, it’s keeping your body fat percentage down, keeping your diet in check, not drinking alcohol and access, not using tobacco products, trying to stay in a healthy environment. A lot of the factors that it’s pointing to are things that we can try to change.
Marie: Even when we go back to the question that you had about how do we communicate this with the people that are representative in this study based on their data in the UK bio bank, they’re going to have to talk with people from a lot of different walks of life and not just the people that have heard about the service and are knocking on their doctor’s door saying, hey, I want to see what my results were. This is going to include everybody basically in that age population of 40 to 69 where they’re working with them on probably, you know, an annual basis or whatever their, their appointment schedule is to say, here’s information that we have that’s important for you to understand and trying to get them to understand how to take action on it.
Lexy: To be clear though, this data predicted deaths within the six years of Longitudinal study that they had, so from 2010 to 2016 those people are already gone. So now the question is what is the incremental prediction for the next six years or however long? For the people who remain for whom they had data and is it transferrable to the next population of 40 to 60 that’s coming nine that’s coming up. It’s looking at a pretty large population and saying, okay, we studied this 10 to 15 years ago. We started collecting data. How many of the people who are now coming up into that age range have the same factors or have the same issues to to look at some of the variables that the deep learning technique pointed to where the taking of certain medications makes me wonder which medications those were and if they’re still on the market because there’s a very real possibility that some medications may have been provided at various times and then later it was found that they were or whatever it might be. They were pulled from the market. Would those medications still be applicable to the next group of people, for instance, and if that was one of the major factors influencing the accuracy of the model, how accurate is the model going to be next time around.
Marie: And that goes back to train transparently. What are the factors that you’re putting into your algorithm?
Lexy: Exactly. Hence why the data science process is cyclical.
Marie: So cyclical.
Lexy: Always going back, always revisiting, always revising.
Marie: All right, well thanks everybody for joining us for this quick take on the algorithm that was developed based on the UK bio study. This is Marie Weber.
Lexy: and Lexy Kassan
Marie: Talk to you next time.
Lexy: Thanks so much.
We hope you’ve enjoyed listening to this episode of the Data Science Ethics podcast. If you have, please like and subscribe via your favorite podcast App. Also, please consider supporting us for just $5 per month. You can help us deliver more and better content.
Join in the conversation at datascienceethics.com, or on Facebook and Twitter at @DSEthics where we’re discussing model behavior. See you next time.
This podcast is copyright Alexis Kassan. All rights reserved. Music for this podcast is by DJ Shahmoney. Find him on Soundcloud or YouTube as DJShahMoneyBeatz.