Preventing Medical Data Biases


Right now the public narrative around data sharing is confused and suspicious, with more “data brokers” being railed in the news every day. It is hard enough to tell who can access data from the apps on your mobile phone, nonetheless your medical records.

Most patients do not understand where their medical data is stored and who has access to it. This becomes further complicated when we consider differences in culture, socioeconomic level, and education between different countries.

Many citizens are pro-sharing their health data, and realize that it can significantly benefit society.

Services like Ciitizen let patients upload and share their data for free to help themselves, their family members, and others like them. But using data uploaded by patients poses some problems for research -- introducing selection bias, providing very fragmented datasets, and leaving out much of the population who could still benefit from the development of medical artificial intelligence (AI) algorithms. 

Bill Gates was on to something when he said that we always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten years. It’s unlikely that radiologists will be totally out of a job in 2021, but by 2029 I’d be willing to bet that AI will be capable of handling or at least helping with a significant portion of their daily reads. 

That means that many of the algorithms that will arise in the next ten years are being conceived of and worked on now. They are being developed using whatever data companies and researchers can access, which isn’t necessarily always high quality or representative of people from all backgrounds, geographies, and socioeconomic statuses. It also isn’t always done with patient consent. 

In a worse case scenario, this could result in algorithm biases, as we’ve already seen happen in other industries. 

If we insist that we need individual patients to consent to and share their medical data, advancements in healthcare that have the potential to help a lot of people may not reach them for decades (or more likely, may never be able to be developed). Medical AI has the potential to help cut healthcare costs, improve treatment by decreasing medical errors and increase accessibility to quality diagnostics in places that don’t have enough specialists to serve their entire population. If we can develop algorithms using population scale medical data without ever compromising the identity of individuals, it is the most logical solution. That’s what Segmed is doing. 

We take every precaution to make sure data that developers have access to is fully anonymized by removing identifying information, scrambling identifying features when necessary, and removing rare outliers from our training datasets. We are actively working with our hospital partners, researchers and advisors to investigate cutting edge solutions for training algorithms so that data doesn’t have to leave hospital facilities and stands the smallest chance possible of being leaked or re-identified.

If we don’t aggregate training data intentionally and pay attention to the distributions of source populations, we run the risk of widening chasms between have and have-nots. However, if we do manage to succeed in this mission, perhaps we will have made meaningful progress towards a world where healthcare works for everyone.

Overall, in a society where a significant portion of healthcare advancements are driven by industry, Segmed represents a force for accelerating the entire field of medical AI. We believe it will have a tremendously positive impact on millions of patients and clinicians all over the world, especially with care and attention given to issues like patient security and diversity of data. We strive to make these advancements possible ethically, responsibly, and transparently.

Cailin Hardell