Physicians Versus Machine-Learning Algorithms in Classifying Pigmented Skin Lesions
Posted: Thursday, June 20, 2019
According to a simulation study by Philipp Tschandl, PhD, of the Medical University of Vienna, and colleagues, machine-learning algorithms may be capable of diagnosing pigmented skin lesions more accurately than human readers. Whereas other diagnostic studies sought to distinguish between melanoma and benign nevi alone, this study included images from seven disease categories, including a large number of benign lesions. The results were published in The Lancet Oncology.
“Although machine-learning algorithms outperformed human experts in nearly every aspect, higher accuracy in a diagnostic study with digital images does not necessarily mean better clinical performance or patient management,” cautioned the authors.
In this Web-based, international, diagnostic study, a total of 511 physician readers, comprising board-certified dermatologists (55.4%), dermatology residents (23.1%), and general practitioners (16.2%), completed the test set study. Their diagnoses were compared with those generated by 139 algorithms from 77 machine-learning labs. Both human readers and machine-learning algorithms were trained with a set of representative dermatoscopic images and were tasked with diagnosing 30 randomly selected images out of a test set with 1,511 images.
Overall, machine-learning algorithms correctly identified an average of 2.01 more correct diagnoses than did human readers (P = .0001) as well as an average of 0.79 more than expert readers (those physicians with 10 or more years of experience). Compared with the top three machine-learning algorithms, human readers were less accurate (averaging 25.85 and 17.91 correct answers, respectively). The difference between human and algorithm-generated diagnoses was greatest in image sets with random and more benign cases and smallest in malignant batches (P < .0001 for all).
“Physicians usually examine the entire patient and not just single lesions. When humans make a diagnosis, they also take additional information into account… which was not provided in this study,” said Dr. Tschandl in an institutional press release. “In [the] future, it is probable that automated classifiers will be used under human guidance, rather than alone.”
Disclosure: The study authors’ disclosure information may be found at thelancet.com.