Using Machine Learning to Predict Risk of Infection in Newly Diagnosed CLL
Posted: Thursday, April 15, 2021
A group of researchers in Denmark developed a predictive model to identify patients with chronic lymphocytic leukemia (CLL) who may be at risk of infection or treatment within 2 years of diagnosis. The tool, called the CLL Treatment-Infection Model (CLL-TIM), was discussed in detail in Nature Communications by Carsten U. Niemann, MD, PhD, of Copenhagen University Hospital, Denmark, and colleagues.
“To address concerns regarding the use of complex machine learning algorithms in the clinic, for each patient with CLL, CLL-TIM provides explainable predictions through uncertainty estimates and personalized risk factors,” concluded the authors.
Researchers developed an explainable machine learning model by utilizing data from 4,149 patients in Denmark diagnosed with CLL between 2004 and 2017. The model's composite outcome was the combined event of an infection or CLL treatment within 2 years from the prediction point (time point zero). Baseline variables were collected for each patient, including age, gender, Binet stage, family history of CLL, Eastern Cooperative Oncology Group performance status, β-2 microglobulin levels, CD38 positivity, and IGHV mutation. Routine laboratory testing, microbiology/blood culture findings, pathology reports, and diagnostic codes were also used to create the model. The model takes advantage of 85 original variables in patient histories that translate to 228 engineered features.
The current gold-standard CLL prognostic model—the CLL-International Prognostic Index (CLL-IPI)—was compared with CLL-TIM. According to the researchers, CLL-TIM outperformed CLL-IPI in precision and recall for predicting infection or treatment. The model identified that β-2 microglobulin levels were associated with an increased risk of infection prior to CLL treatment. In contrast, Binet stage A and unmutated IGHV seemed to be associated with CLL treatment occurring before an infection. Several other factors, including maximum leukocyte level, number of days between infections, age, and platelet levels, were associated with the composite outcome.
Disclosure: For a full list of author disclosures, visit nature.com.