1,000 Citations and Counting: Fueling Segmed’s Mission to Make Medical Imaging Data Accessible to Innovators

Author:

Martin Willemink

Reading time /

4 min

Research

‍1,000 Reasons to Celebrate

Every so often in research, a project lands at just the right moment, addressing a challenge many people were quietly struggling with. For us at Segmed, that project was our 2020 paper: Preparing Medical Imaging Data for Machine Learning, published in Radiology.

We didn’t set out to write a 'landmark paper'. We were simply trying to make sense of the chaos we were seeing firsthand: the inconsistent, fragmented, and inaccessible world of medical imaging data for AI.

Now, five years later, that paper has been cited more than 1,000 times! A milestone we’re proud of, not just because of the number, but because of what it represents for healthcare AI and for Segmed.

Why This Paper Exists and Why Segmed Does Too

At the time, AI in medical imaging was gaining momentum, many AI startups got founded, and larger companies started developing AI models as well. But almost nobody was talking about the elephant in the room: preparing imaging data for AI is hard. Clinical imaging data is stored in secure silos (for good reason). Which is great for clinical use, but makes it challenging for AI developers to access this data. Additionally, de-identification of imaging data is complex, every institution has different IT systems, annotations vary wildly. Metadata was missing or incompatible. And privacy concerns made data sharing nearly impossible.

We wrote this paper because we needed a clear, practical guide for how to actually get imaging data AI-ready.

And that exact problem is what led us to found Segmed.

The mission behind Segmed has always been simple:

Make quality, standardized, de-identified medical imaging data accessible to healthcare innovators, while protecting patient privacy and respecting data rights.

While our mission may not be sexy, this paper (and Segmed’s significant growth!) showed that it is important! This paper was the first expression of that vision. It set out the foundational workflows, best practices, and safeguards we still rely on and evolve today.

‍

What the Paper Covers

The paper outlines the fundamental steps for preparing imaging data in AI algorithm development, explains current limitations to data curation, and explores new approaches to address the problem of data availability. Topics of the paper include:

Accessing and querying data
De-identification
Data storage
Resampling medical images
Labeling and ground truth definition
Dataset types and sizes
Data sources
Federated learning

We also highlighted the common mistakes and challenges teams encounter when working with real-world imaging data, things we’ve spent years addressing through Segmed’s platform.

‍

Why It Took Off

Because it addressed a universal problem. AI teams everywhere were running into the same frustrations: inconsistent data, annotation gaps, privacy barriers, and poor model generalization.

This paper gave them a starting point: a practical, field-tested set of recommendations they could build from.

It’s since been cited by AI healthcare startups, university research groups, hospital systems, regulatory submissions, and even the World Health Organization. And it still serves as a go-to reference for anyone working to develop AI responsibly in clinical imaging.

How It Connects to Segmed’s Mission Today

At Segmed, this milestone feels personal. It validates not just a publication, but the core belief we built this company on:

Better data makes better AI. And better AI improves healthcare for everyone.

Since the paper’s publication, we’ve expanded that mission - offering AI developers access to curated, de-identified imaging datasets and clinical metadata through our platform, built on the very workflows we first described in this paper.

As the technology in healthcare continues to evolve, our commitment remains the same: clean, reliable, ethically sourced imaging data for healthcare innovation. Segmed has been part of more than 35 FDA clearances, multiple foundation models, and fit-for-purpose real world evidence research projects.

Thank You

To the AI researchers, clinicians, developers, and innovators who’ve read, cited, and built upon this work: thank you. Science moves forward because of communities like this.

And of course, thank you to the authors of the paper: Martin Willemink, Adam Koszek, Cailin Hardell, Jie Wu, Dominik Fleischmann, Hugh Harvey, Les Folio, Ronald Summers, Daniel Rubin, and Matt Lungren.

Here’s to 1,000 citations, countless lives touched through better AI, and the work we still have ahead.

Curious? Read the original paper here.

And if you’re building AI for healthcare, let’s talk.

--

‍
Appendix: Key Peer-Reviewed Papers Citing Segmed’s Research‍

Warman P, Warman A, Warman R, Degnan A, Blickman J, Smith D, McHale P, Coburn Z, McCormick S, Chowdhary V, Dash D, Sangal R, Vadhan J, Bueso T, Windisch T, Neves G. Using an artificial intelligence software improves emergency medicine physician intracranial haemorrhage detection to radiologist levels. Emerg Med J. 2024 Apr 22;41(5):298-303. doi: 10.1136/emermed-2023-213158. PMID: 38233106. https://emj.bmj.com/content/41/5/298
Ryu WS, Song JW, Lim JS, Lee JH, Sunwoo L, Kim D, Kim DE, Bae HJ, Lee M, Kim BJ. Segmentation of Leukoaraiosis on Noncontrast Head CT Using CT-MRI Paired Data Without Human Annotation. Brain Behav. 2025 Jun;15(6):e70602. doi: 10.1002/brb3.70602. PMID: 40495458; PMCID: PMC12152255. https://onlinelibrary.wiley.com/doi/10.1002/brb3.70602
Chung JH, Chelala L, Pugashetti JV, Wang JM, Adegunsoye A, Matyga AW, Keith L, Ludwig K, Zafari S, Ghodrati S, Ghasemiesfe A, Guo H, Soo E, Lyen S, Sayer C, Hatt C, Oldham JM. A Deep Learning-Based Radiomic Classifier for Usual Interstitial Pneumonia. Chest. 2024 Feb;165(2):371-380. doi: 10.1016/j.chest.2023.10.012. Epub 2023 Oct 14. PMID: 37844797; PMCID: PMC11026174. https://journal.chestnet.org/article/S0012-3692(23)05569-1/abstract
Park D, Lee N, Lim S. Attention-based Interpretable Deep Learning with Radiomic Features for Pulmonary Nodule Classification. Medical Imaging with Deep Learning - Short Papers, 2025. https://openreview.net/forum?id=TU0z4ABShQ
Zambrano Chaves JM, Huang SC, Xu Y, Xu H, Usuyama N, Zhang S, Wang F, Xie Y, Khademi M, Yang Z, Awadalla H, Gong J, Hu H, Yang J, Li C, Gao J, Gu Y, Wong C, Wei M, Naumann T, Chen M, Lungren MP, Chaudhari A, Yeung-Levy S, Langlotz CP, Wang S, Poon H. A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings. Nat Commun. 2025 Apr 1;16(1):3108. doi: 10.1038/s41467-025-58344-x. PMID: 40169573; PMCID: PMC11962106. https://www.nature.com/articles/s41467-025-58344-x
Kang DW, Kim M, Park GH, Kim YS, Han MK, Lee M, Kim D, Ryu WS, Jeong HG. Deep learning-assisted detection of intracranial hemorrhage: validation and impact on reader performance. Neuroradiology. 2025 Mar 21. doi: 10.1007/s00234-025-03560-x. Epub ahead of print. PMID: 40116947. https://link.springer.com/article/10.1007/s00234-025-03560-x
Park D, Kang J, Park C. Multi-center Validation of Pulmonary Nodule Classification Model for Lung Cancer Screening. Preprints, Computer Science and Mathematics, Artificial Intelligence and Machine Learning. doi: 10.20944/preprints202504.0056v1. https://www.preprints.org/manuscript/202504.0056/v1