Regulatory and Privacy Challenges for Real-World Imaging Data-Driven Foundation Models

Author:

Segmed Experts

Reading time /

3min

Industry

‍Introduction

Development of foundation models in healthcare has increased considerably over the recent past, especially as a way of enhancing critical clinical workflow optimization. Foundation models are a form of generative artificial intelligence (AI). Applications like large language models (LLMs) are common examples of foundation models. It is becoming apparent that the application of foundation models to decision support systems enhances clinical workflows, supports diagnosis, and enables personalized care.

Foundation models and other recent developments related to artificial intelligence (AI) have a substantial impact in healthcare, helping with more personalized and efficient patient management. Since these innovative technologies require large amounts of data, adoption of real-world data has become essential. And with newer sources of real-world data such as real-world imaging data (RWiD) coming available at scale, innovation in healthcare is ramping up tremendously.

‍

RWiD driven foundation models

RWiD-driven foundation models can process and analyze radiology images like CT scans and X-rays. Some foundation models are trained on multi-modal data and thus are able to integrate with other sources of clinical data. RWiD-driven foundation models can handle or assist in multiple imaging tasks without the need of specialized training for every application. Capabilities of these large scale models include improving interpretation of images and automating generating reports.

However, the introduction of foundation models in the clinical environment brings forth substantial challenges around regulatory and privacy considerations. RWiD-based healthcare foundation models need to comply with the regulatory and privacy rules so as to affirm their compliance and steer clear of any legal issues.

Non-adherence with privacy guidelines can risk leakage of PHI (protected health information), which may lead to non-compliance with laws like HIPAA and GDPR. Non-adherence to regulatory guidelines will lead to disparate and inconsistent data being used to train and develop foundation models. This will lead to models being inefficient and unreliable. Therefore, compliance with laws is essential for the development of high-quality and streamlined foundation models, especially for model application in healthcare.

‍

Regulatory and privacy challenges for RWiD-based foundation models

‍

1. Regulatory issues: standardization and quality of data

Ensuring high data quality and consistency is a serious challenge when utilizing RWiD to construct foundation models. Different healthcare environments, with their respective ways of collecting, categorizing, and reporting data, are often the source of RWiD. This leads to significant differences in the quality of metadata, diagnostic criteria, and datasets received. It then becomes difficult to adhere to strict regulatory needs from authorities like the FDA and EMA.

In order to adhere to regulatory needs, foundation models must be trained using consistent and complete data. This requires ensuring the data is accurately labeled and standardized in various healthcare settings. Thus, RWiD used to train and develop these models must be both standardized and of high quality, ensuring that the models produce transparent, consistent, and reproducible results. Additionally, completeness of data also includes relying on multiple data sources and making an effort to source data from a range of geographic regions to avoid training bias.

Standardization and data quality are not just technological barriers but vital steps to achieve regulatory approval. Failure to meet these standards will result in delays and may bar the deployment of foundation models in healthcare. Facilitating RWiD compliance with these standards is vital to promoting the advancement of effective and reliable healthcare foundation models.

‍

2. Privacy issues: de-identifying and anonymizing data

Safeguarding sensitive healthcare data details like Protected Health Information (PHI), plays a crucial role in maintaining privacy guidelines. Privacy issues limit the use of Real-World Imaging Data (RWiD) in foundational models such as large language models (LLMs). The General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) control how healthcare information is handled. These privacy laws aim to keep patient information confidential and protect delicate clinical details linked to patients. They also set out strict rules to acquire, store, and transfer patient information, making sure it follows ethical and legal standards.

To guarantee secure and responsible data handling, anonymization or de-identification of healthcare data is pivotal. Although the process of redacting personally identifiable information (PII) may minimize privacy threats, re-identification remains a probability, especially where data is interfaced with other sources.

RWiD foundation models raise privacy concerns, which emphasize how crucial it is to put robust data anonymization and de-identification procedures in place. Developing effective and credible foundation models requires striking a balance between providing necessary healthcare information and abiding by privacy laws.

‍

Navigating the regulatory and privacy challenges

At Segmed, we recognize the importance of meeting regulatory and privacy requirements while providing assurance of the ethical healthcare information. We provide access to over 100 million regulatory-grade, de-identified, multimodal imaging datasets that meet the highest regulatory requirements. Our organization has put in place secure data de-identification systems to protect patient information while maximizing RWiD utility in foundation model training.

We collaborate with our partners to ensure that our datasets are ethically sourced, unbiased, and representative of diverse patient populations. By strictly adhering to regulatory guidelines, our RWiD datasets help train and fine-tune foundation models. These RWiD-driven models provide enhanced diagnostic accuracy, generalizability across populations, and improved robustness to varying clinical settings.

‍

Connect with us to learn how our high-quality, regulatory-grade, de-identified imaging datasets can support the training as well as fine-tuning of healthcare foundation models.