Focus Quality Evaluation of Microscope images with Deep Learning

Abstract number
Presentation Form
Contributed Talk
Corresponding Email
[email protected]
Artificial Intelligence
Mr Zhongwang Li (1), Dr Keith Siew (1), Dr Stephen Walsh (1), Prof Simon Walker-Samuel (1)
1. University College London

Focusing, Machine Learning, Renal Medicine

Abstract text


Acquiring a significant quantity of high-quality image data is a foundational step for training any accurate image model. This often necessitates automated collection of imaging data (e.g. whole slide scanning microscopes) that may result in occasional out-of-focus images or regions being acquired, and the inclusion of these in the final dataset may affect the final training outcomes. However, finding these data in numerous images would be a time-consuming, laborious task, and the naked eye is incapable of precisely quantifying the degree to which an image is out of focus. Therefore, developing an image model that could measure the blurring of an image is desirable. This would allow researchers to rapidly and efficiently remove the confounding data from the dataset during the workflow, thus improving the quality of the dataset and the training results.

Due to the reasons stated above, we would like to propose a Convolutional Nerural Network model, trained with scanned 8-bit RGB human kidney biopsies brightfield images, using the images as input and the output values as an evaluation of the degree of focus of the images.


Archived kidney needle core biopsies from UCLH and the Royal Free Hospital were used to create our dataset. The biopsies were scanned at a magnification of 20x using the AXIO Scan Z.1 automated batch scanning microscope. We acquired multi-slice focus stacks (1-micron Z-steps) of each slide. The sharpest z-slice (and therefore assumed to be the most in-focus image) was selected on the basis of the largest standard deviation. From this z-slice, we selected areas clearly in focus and with no tile stitching misalignments as the original images to undergo ten levels of artificial blurring for the training model. The final dataset used for training is 33,691 images in 8-bit RGB PNG format. 

The CNN model has two major parts; the first one is a multi-scale feature encoder called InceptionV3, which converts the 320x320x3 input image to a 2048 vector. A six-layer fully connected network was used to analyse and fit the 2048 vector to a value representing the in-focus level. The closer this number is to zero, the more accurately the image is focused.

The model was trained with a learning rate of 1x10^-5 and a batch size of 200 for 500 epochs.

Results & Discussion 

The model performs well on the validation set with 3730 images. The linear fitting of the validation predicting results is y = 0.59 + 1.157x, which is quite close to the perfect situation, y = x. The r square value of 0.9904 indicates that the model is perfectly capable of recognising different degrees of in-focus level. Within the normal focusing error range, -4um to 4 um (usually manually or machine focusing error/bias may be caused), the recognising error of the model is ±1um, which is good enough for general focusing tasks. This performance is considered acceptable, but there is still room for further improvement. In future tests, we'll use a larger number of photos as the training set. We'll also expand the model's size to learn new characteristics.


We proposed a CNN model that can infer the absolute focus fraction of an image to better assist researchers in determining whether there are unfocused, unclear or blurred images in the acquired dataset.