Researchers from the University of London, Moorfields Eye Hospital NHS Foundation Trust, Kingston University and Homerton Healthcare NHS Trust say that the platform removes any biases that can come from companies wanting to deploy their AI software in clinical settings, putting all companies on a level playing field.
Currently, NHS AI algorithm selection focuses on cost-effectiveness and matching human performance. However, broader challenges remain, particularly the need for robust digital infrastructure and more rigorous testing of commercial algorithms.
In a study published in The Lancet Digital Health, researchers trialled the independent platform to compare commercial AI algorithms designed to detect diabetic eye disease. These algorithms work by identifying signs of blood vessel damage at the back of the eye.
Eight AI algorithms were ‘plugged in' to the platform and run on 1.2 million images of the back of the eye from the North East London Diabetic Eye Screening Programme – one of the largest and most diverse diabetic screening programmes for ethnicity, age, deprivation level and spectrum of diabetic eye disease.
The performance of the eight algorithms was compared to images analysed by up to three humans who followed the standard protocol currently used in the NHS. Vendor algorithms did not have access to human grading data and companies were excluded from the data ‘safe haven' where the images were being analysed by their algorithms.
In total, 202,886 screening visits were evaluated, representing 1.2 million images from 32% white, 17% Black, and 39% South Asian ethnic groups. The AI systems took just 240 milliseconds to 45 seconds to analyse all images per patient, compared with up to 20 minutes for a trained human.
The accuracy across the AI algorithms to identify diabetic eye disease potentially in need of clinical intervention was 83.7-98.7%. Importantly, accuracy was 96.7-99.8% for moderate-to-severe diabetic eye disease and 95.8-99.5% for the most advanced (proliferative) sight-threatening diabetic eye disease.
The platform also detected the rate of healthy cases being incorrectly flagged as having diabetic eye disease by each algorithm, another critical measure of accuracy. It showed that the algorithms performed consistently well across different ethnicity groups, the first time this has been assessed.
Professor Alicja Rudnicka from the School of Health and Medical Sciences at City St Georges, University of London, who led the study, said: ‘Our revolutionary platform delivers the world's first fair, equitable and transparent evaluation of AI systems to detect sight-threatening diabetic eye disease.
‘This depth of AI scrutiny is far higher than that ever given to human performance. We've shown that these AI systems are safe for use in the NHS by using enormous data sets, and most importantly, showing that they work well across different ethnicities and age groups.'
Co-principal investigator Adnan Tufail from Moorfields Eye Hospital said: ‘There are more than 4 million patients with diabetes in the UK who need regular eye checks. This groundbreaking study sets a new benchmark by rigorously testing AI systems to detect sight-threatening diabetic eye disease before potential mass rollout. The approach we have developed paves the way for safer, smarter AI adoption across many healthcare applications.'
Prof Rudnicka added: ‘This work paves the way to expand the use of our platform from a local to national level.
‘Our vision is to deliver centralised AI infrastructure that hosts approved algorithms, enabling all screening centres to upload retinal images securely for analysis. The AI-generated results would be returned to the centre and integrated directly into the patient's electronic health record. This approach eliminates the need for duplicating infrastructure across multiple sites, reducing setup costs and ensuring consistent, equitable service delivery nationwide.'
The researchers state their platform benefits all – giving companies the opportunity to get independent feedback for improving their technology and for NHS trusts to select the AI tools that work best for them, making highly repetitive tasks more efficient so that people who do the screening can focus on higher risk disease and employing newer types of retinal scans. Patients will also ultimately benefit from much faster diagnosis and optimal care.
The transparent approach could become the blueprint for evaluating AI tools across other chronic diseases, helping to build public trust and accelerate safe, equitable AI adoption in healthcare.
