While the hype has inevitably died down in recent years as the inherent challenges present in the healthcare sector have reared their heads, the desire to see data and AI used to transform the way we receive healthcare remains high.
Research from the University of Michigan reminds us that such an approach is only as strong as the data behind it, and that this could freeze out minority patients and lead to growing health inequalities. This is because the data that is used to train AI systems is either not representative of the diverse population or reflects what is already unequal care.
Health inequalities remain such a pressing issue that a team of Cambridge researchers recently urged the UK government to not give up on the desire to reduce health inequalities across the country that were made all too visible during the Covid pandemic.
For instance, research from Imperial College London found that disadvantaged and marginalized people face even worse health inequalities than they ordinarily would as a result of the choices hospitals are making in response to the pandemic. A restriction in non-urgent services to free up resources for covid-19 is having a big impact in areas such as sexual health, gynecology, and pediatrics.
What’s more, the decline in attendance to emergency departments, which have seen a fall of 44% during March 2020, disproportionately affected vulnerable people, who often used them for routine care as they struggle to access general practice and community services.
The Michigan research shows how embedded biases against minority populations can make matters worse. They found that the biases that already exist in society can ensure that minority groups are not well represented in the projects that help to train AI-based systems. This is reflected in a previous studywhich shows that there are very different rates of consent in the Michigan Genomics Initiative for members of various minority groups.
This bias can take on a cyclical dimension, with recruitment often inadequate to begin with, which results in poorer engagement and a perception that patients from minority groups are not interested in research, and so the cycle goes, strengthening with each iteration. This pattern can also be observed in medical AI.
For AI systems to then be trained on data that fundamentally reflects decades of deeply entrenched bias in the kind of care people receive alongside the poorly representative data that was mentioned earlier means that AI systems are learning from extremely skewed and biased data. These biased patterns are then used by the system to predict and recommend, so those very same predictions and recommendations are likely to be skewed and biased as well. This is further compounded by the likely negative response to those recommendations from patients, and so the cycle continues.
The researchers hope that they gain a better understanding that health inequalities and exclusion can become self-reinforcing, especially when we start working with research data and healthcare-related AI systems. While there are various tools and approaches to try and rectify biased datasets, these are certainly no guarantee and officials need to be constantly mindful of the potential for biased outcomes.
It’s also important for officials to understand how this cycle is self-reinforcing, as AI systems typically continue to learn from the data that is generated, so if they continue to generate biased outcomes then that will be fed back into the system’s ongoing learning. Indeed, the Michigan researchers suggest that even if the physician themselves is unbiased, if they’re working with technology that was trained using biased data, they’re quite likely to make biased decisions themselves.
In order to resolve this problem, the researchers believe it’s crucial that officials fully understand the complex reinforcing dynamics at work. If they focus purely on just one part of the system or one particular bias then it’s unlikely to be effective.
A recent Yale study provides a degree of optimism that solutions might be found. The approach ensures that sensitive data is included when training algorithms, but then masked when actually being used. They believe the approach maintains the accuracy of the system while reducing discrimination in it.
The approach works in two phases. The first of these uses training data to help the algorithm learn how particular attributes are linked to each outcome. The algorithm is then given information about any new cases and attempts to predict what will happen based on similarities with previous cases.
The researchers explain that removing sensitive information from the training data can result in latent discrimination, they had to think of a different approach to reduce bias in the system. One approach they considered was to boost the scores of people from disadvantaged groups, but this resulted in two people who are identical other than their race or gender receiving different scores, which typically produced a backlash.
The eventual approach decided upon was referred to as “train then mask”. It involved the system being given all of the information about past cases during the training phase, including any sensitive information. This approach meant that the algorithm wasn’t incorrectly giving undue importance to factors that were unrelated and could also be used as a proxy for more sensitive features.
They then hid the sensitive features in the second stage, so that all new cases would be given the same value for these features. This would force the system to look beyond both race itself and any proxies for race when it compared individuals.
The system was able to produce results that were as accurate as an unconstrained algorithm, or one that had not been adjusted to try and reduce unfairness. The researchers also believe that the approach helps to reduce what they refer to as “double unfairness” where someone from a minority group performs better than those from the majority group on certain metrics but the discrimination they face lumps them in with the majority.
Finding health inequalities
Of course, while health AI has various problems to overcome in terms of fixing health inequalities, it can also be used to understand where gaps exist. This was the conclusion of research from the University of Washington, which explored how data can help to better identify gaps in healthcare provision in rural areas.
They examined the health data situation in four states in the northwest United States, with the ultimate aim of helping communities better use data to tackle health inequality.
“Rural communities in Washington, Oregon, Idaho, and Alaska face high poverty and are home to large populations of Alaska Native, Native American, Latino, and other residents who are often marginalized and impacted by health disparities,” the researchers say.