Product

Federated Learning, Explained

4 min read
Segmed Team

Although federated learning (FL) is becoming an increasingly popular form of machine learning, it is still considered a new domain.  FL was only recently coined in 2016 when it was introduced in a paper published by Google AI called “Communication-Efficient Learning of Deep Networks from Decentralized Data.”  

As the name suggests, federated learning’s advantage point over other forms of machine learning is its ability to work using decentralized data.  It is extremely important for decentralized data to be used—especially in the field of healthcare, where patient medical data is sensitive—in order to avoid privacy breaches or violations.  Additionally, federated learning can be used to train data with rare cases, as it is a result of the large amounts of varying medical data from different data sources.  When there is enough of this diverse data, the algorithms are able to identify more cases and give more accurate predictions and/or conclusions.    

Federated learning works by using a reverse model system.  In this system, the computer is brought to the data rather than the data being brought to the computer. With this arrangement, all of the data is able to stay inside the facility.  As the Open Data Science stated, federated learning “is an approach that downloads the current model and computes an updated model at the device itself using local data.”  When it comes to hospitals specifically, this method is useful in that it protects against privacy breaches with protected health information (PHI) of individual patients.  

This is because “none of the training data is ever transmitted between devices or between parties, only the updates are.”  By using FL, it eliminates the intermediary steps between different devices and/or parties, which—in turn—allows for better security.  It is different from the typical method of machine learning which takes de-identified data from the hospital and transfers it to a central storage system (such as the Segmed database).  While machine learning typically brings the data to the central database, federated learning leaves the data where it is and brings the algorithm to the data. 

In addition to protecting PHI through better security and privacy, another benefit of FL is that it allows for training more accurate and less biased algorithms.  This is a direct result of the richer data, which comes from acquiring data from myriad companies in different geographical locations.  The wider the range of geographical locations, the more accurate the technology becomes.  

Additionally, FL is a type of machine learning built on collaboration.  This is beneficial due to the fact that more collaboration correlates with larger amounts of high-quality data.  When there is no worry that private data is leaving a facility, and there is no risk of sharing personal data between companies, then there is a greater chance for data collaboration and overall growth. 

Although federated learning has many benefits, there are still some limitations to the full implementation of the technology.  First, scalability is limited due to the fact that image annotation and labeling must be done using a standardized system and terminology.  This is because the medical data cannot leave the data provider’s site, so it must all be conformed to a single method.  

This is a huge challenge in healthcare where the data is so heterogeneous.  Another limitation to FL is that “substantial computation resources may need to be replicated and placed within each facility” and “preprocessing and organizing the data for ingestion by the algorithm is challenging.” Additionally, although only the gradient information is shared with algorithm developers, some sensitive information may still be present in the gradient information. This sensitive information must first be protected in order for FL to be fully anonymized. There are a variety of techniques being researched to prevent data reconstruction, including differential privacy.

Segmed uses FL technology to maximize security and minimize risks that come with sharing medical data.  To date, Segmed has set up a local server with one data partner and plans to set up one at every future Segmed data partner.  Once the servers are set up in a hospital or other data site, we do a federated search—a universal search—to include data from various facilities.  This federated learning implementation allows us to collaborate and have access to many more data sources that would not normally be accessible, as well as preemptively preventing privacy breaches.

This server setup has been successful, and it will provide the organizational framework for the rest of the local server implementations.  Through the use of federated learning, we will be able to change how data sharing is accomplished and how we will subsequently help in changing the future of healthcare.

Accelerate your development pipeline

Curate your training & validation datasets on Segmed Insight!