Using ML to Diagnose Chronic Kidney Disease in a Bias-Free, Privacy-Protecting, Physician-Friendly Manner

By Allison Proffitt

August 9, 2023 | Carenostics is a healthcare AI startup founded with the mission of addressing the underdiagnosis, undertreatment, and health inequities of chronic disease using AI and ML on existing healthcare data. Their work on chronic kidney disease (CKD)—developed in close connection with clinicians—attracted the judges’ attention in the 2023 Bio-IT World Innovative Practices Awards program. Carenostics’ collaboration with Hackensack Meridian Health was named one of five winning projects for 2023.

Chronic kidney disease (CKD)—the gradual loss of kidney function over time—affects one in ten adults globally (700M–1B total), costs the global health system over $1T, and has emerged as one of the leading causes of mortality worldwide.

Unfortunately, many patients at risk of developing severe disease are not being sufficiently treated because they don’t know they are sick. CKD comprises all stages of kidney damage measured by how well the kidneys can filter waste and extra fluid out of a person's blood: from very mild damage in stage 1 to complete kidney failure requiring dialysis or transplant in stage 5 (ESKD, end stage kidney disease). In the US, 90% of individuals at risk of developing severe disease do not know they have CKD, and even 40% of people with severely reduced kidney function are unaware of their disease.

The racial disparities are even more disturbing. While the prevalence of early stages of CKD is similar across different racial/ethnic and socioeconomic groups, the prevalence of ESKD is greater for minorities than their non-Hispanic white peers. In the United States, Black individuals develop ESKD at a rate three-fold higher than white individuals. Although the causes for these health disparities are varied and embedded across the care continuum, significantly poorer diagnosis rates of disadvantaged populations are a major driver of disparate CKD outcomes.

Staging Challenges

Diagnosing and treating patients at Stage 1 or 2 CKD can be difficult. Blood tests at these concentrations are imprecise and carry a high risk of false positive. And comparative effectiveness studies of treatments for Stage 1-2 CKD have not shown significant benefits, possibly due to the very long follow-up times needed to detect effects.

By Stage 3 disease, patients have typically lost roughly 50-75% of their kidney function—even though they may show no symptoms. Roughly 6.3% of Americans (16-20M adults) have Stage 3 CKD and 80% are unaware. However, there are well-defined therapeutic guidelines known to slow CKD progression, and damage can be reliably measured by inexpensive blood (eGFR; estimated Glomerular Filtration Rate) and urine (UACR; Urine Albumin-to-Creatinine Ratio) tests.

Current guidelines recommend against universal screening for CKD with eGFR and UACR tests, primarily due to the high risk of false positives. Guidelines do recommend screening in asymptomatic individuals with significant risk factors for CKD, but the criteria for selecting such high-risk patients are loosely defined and hard to automate reliably.

To tackle these issues, Carenostics has developed machine learning methods to learn robust and accurate models from electronic health records (EHRs) to predict occult Stage 3 CKD and enable early clinical interventions. These models are built using leading bias-adjusted machine learning approaches to counteract the underlying biases in healthcare data, and novel privacy-preserving machine learning approaches Carenostics has developed for cross-institutional learning. The solution, called CKDx, analyzes existing EHR data at a provider institution, identifies patients who are likely to have undiagnosed CKD, and flags these patients to primary care physicians (PCPs) at the point of care for confirmatory testing.

Iterative Innovation

In the Spring of 2022, Carenostics partnered with Hackensack Meridian Health (HMH), the largest hospital system in the state of New Jersey with 17 hospitals and 6 million patient records, to refine and roll out these models prospectively at the point of care. Carenostics worked in partnership with a team of clinical and technical leaders at HMH (including the Chief Information and Digital Engagement Officer, Executive Director of Quality, Chief of Nephrology, Chief Diversity and Equity Officer, and Chief Medical Informatics Officer) to ensure that the CKDx solution is identifying the correct patients, seamlessly integrated into the existing clinical workflow, and activating clinicians to act on the CKD population.

Together, Carenostics and HMH have focused on four primary innovations: developing high-performance ML models to identify undiagnosed CKD patients from EHR data, privacy-preserving ML approaches, bias-adjusted ML methodology, and a clinician-friendly intuitive interface. The first two goals—developing a high-performance model to identify undiagnosed CKD patients and preserving patient privacy—have been tested and shared publicly. Work on the second two goals—bias-adjusted ML and clinician interface—is ongoing.

Identifying Undiagnosed Patients and Preserving Privacy

Undiagnosed CKD patients generally fall into two groups. In their entry, Carenostics defines “Cohort A” as patients with clinical evidence of CKD (e.g., abnormal eGFR test results in the past) but no diagnosis code. “Cohort B” is defined as patients with occult CKD—patients who have never had any abnormal kidney function readings in the past but would show abnormal values if tested today. “While ‘Cohort A’ can be found with simple guideline-based querying of EHR data (a service that several risk-coding organizations and CKD startups provide today), Carenostics uses AI to identify novel risk factors and patterns of disease to find ‘Cohort B’ patients as well,” the company writes. “It is estimated that there are 3x as many patients in Cohort B as Cohort A. Carenostics retrospective experiments demonstrate that they find >95% of Cohort A patients, and >40% of patients in Cohort B.”

The Carenostics ML models are integrated directly into the HMH EHR, extracting relevant features from a patient’s health record that may be predictive of CKD. These data feed into an ML experimentation framework that runs thousands of permutations of different ML approaches (including logistic regression, decision trees, ensembles—both random forests and xgboost—and neural networks) with a variety of selected parameters to find the highest performance model. “At each of these steps, we work closely with clinicians to ensure that we are extracting clinically relevant features and the patterns identified by our models are clinically valid. The experimentation framework also leverages a temporal learning approach that protects against drift in healthcare data that could lead to our model becoming obsolete,” the authors add.

Patient data privacy is another area in which Carenostics has worked to innovate after rejecting the two most common methods for machine learning from sensitive data. Federated learning (FL) allows distributed sites to collaboratively train a joint ML model without directly disclosing their sensitive data by instead periodically sharing model parameters, however, an attacker can make inferences about local data from model parameters and model updates violating privacy constraints. Differential privacy (DP) approaches provide a rigorous and measurable privacy guarantee that can be achieved by perturbing model parameters appropriately, but that perturbation can reduce model quality, resulting in a trade-off between privacy and model performance.

Instead, Carenostics has developed a patent-pending distributed co-training approach, where institutions train local models and exchange predictions on a shared unlabeled dataset, instead of sharing model parameters. By forming a consensus from shared predictions, one obtains pseudo-labels for the shared unlabeled dataset that can be used for local training. Iterating this process improves the consensus, and thereby the quality of pseudo-labels, effectively nudging local models to come to an agreement. The approach—AIMHI (AI Models for Healthcare Improvement)—achieves the same model performance as FL but protects privacy to a high level. Membership inference is significantly less likely compared to both vanilla federated learning and federated learning with differential privacy.

In their entry, the company reported empirical evaluation of model quality and privacy on the CIFAR10 benchmark dataset, indicates high model quality and a substantial improvement in privacy. Both AIMHI and FL converge quickly within 3,000 epochs to the optimal model performance (AIMHI converges slightly faster), but the privacy vulnerability for FL is already very high after the first communication round and increases to 0.94 after t = 1,000 rounds (a privacy vulnerability of 1.0 corresponds to no privacy and full disclosure). For AIMHI, the vulnerability remains very low even after multiple rounds of attacks, with values of 0.51 to 0.56 under different attack scenarios (0.5 corresponds to complete privacy). Adding differential privacy (DP) to FL does improve privacy vulnerability from 0.94 to 0.85 (however, still markedly worse than for AIMHI) but model performance drops from the optimal 0.61 to 0.42 rendering DP impractical for learning high-performing clinical models.

Carenostics and HMH are in the process of testing the accuracy and privacy performance of AIMHI on HMH’s EHR data by dividing the 6M EHRs into regional groups. Several other large health institutions have indicated interest in working with Carenostics and HMH to roll out our AIMHI approach, pending the results of evaluation on HMH data.

Bias-Adjusted ML and Effective Clinician Interface

Research on bias-adjusting ML and the clinician interface is ongoing.

‘FairML’ refers to research around defining bias in ML models, enumerating a variety of metrics that can be used to measure model bias, detecting instances of it through audit tools, and methods for reducing (or mitigating the impact of) bias in ML models. Carenostics focused on several methods to target bias at each stage of analysis. Before beginning data processing, the team sought to feature-engineer new outcome proxies (for example to remove the assumption that patients without screens follow the same distribution as patients that are screened—a particularly harmful assumption for disadvantaged populations) and train models with different time horizons. The team also incorporated social determinants of health (SDoH) data that HMH is collecting (including REAL and SOGI data) into our analysis where available.

During processing, the system created variables to capture missingness and developing additional models for patients where missing data is systemic. Carenostics worked closely with HMH clinical, diversity, and community experts to pressure test these analyses. Initial comparisons of Carenostics results with traditional methods of dealing with missing data (imputation) show a clear improvement in performance on underrepresented populations, the team reported. The group trained and selected models using fairness-based metrics including proportional recall and other adjusted evaluation metrics to ensure more equitable outcomes. They compared their performance with other approaches using fairness rather than just performance metrics, and report that their model offers improved equity and has the potential to improve both performance and fairness, when compared to current clinical practice.

Finally, the integrated CKDx solution at HMH is an EHR-native, one-click interface that has been developed with physician executives, nephrologists, and PCPs to ensure seamless integration into the existing workflow. When a clinician sees a patient, the doctor is prompted with all the information necessary based on the patient’s profile (e.g., lab orders & documentation for undiagnosed CKD patients, medications & education for untreated CKD patients). The information is also presented to the PCP with the risk factors associated with that prediction (e.g., past eGFR results, history of comorbidities, time since last screen) so that the clinician is satisfied that the flagged patient has the clinical indicators of CKD. The combination of the explainable AI predictions, the native interface, and the one-click workflow has been met with significant excitement from the PCPs where CKDx is deployed today, and Carenostics is continuously iterating to maximize PCP activation from the interface.