Ensembling Experts for Improved Accuracy and Privacy in Predictive Models for Healthcare

Research Grant
2022-2025
Research Grant

Summary: Machine learning models for prediction in medicine are mostly based on time series of clinical measurements. Such models can be improved by adding patient-specific personal features like pre-existing medical conditions or personalized features extractable from unstructured data like anamnesis texts. In order to satisfy the data hunger of neural network models, the standard approach is to learn a unified prediction model on data pooled from several patients. Instead, we propose to learn predictive models on data that were labeled by experts for individual patients, and to combine patient-specific models by ensembling techniques. The advantages of ensembles include efficiency due to their parallel nature and improved predictive accuracy, based on the relationship of the generalization error ensembles to the correlation of the component models with each other. Since individual networks are often highly correlated, generalization accuracy can be optimized by searching for orthogonal models. Ensemble methods furthermore permit a simple mechanism to protect the privacy of personalized component models by applying noise perturbation in the ensemble combination process. Clearly, there is a tradeoff between between optimizing ensemble parameters for improved prediction accuracy versus strong protection of privacy. The goal of the proposed project is to find the optimal tradeoff between accuracy and privacy both from a theoretical perspective, and in experimental applications to predictive models for sepsis and diagnosis tasks from psychiatry.