Towards FAIR AI: Providing open access to high-quality annotated biological images on the BioImage Archive

Abstract number: 127

Presentation Form: Contributed Talk

DOI: 10.22443/rms.mmc2023.127

Corresponding Email: [email protected]

Session

Reproducibility of Data Analysis at Scale

Authors

Teresa Zulueta-Coarasa (1), Matthew Hartley (1)

Affiliations

1. European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI

Keywords

AI, microscopy, imaging, FAIR, open

Abstract text

Modern Artificial Intelligence (AI) techniques have brought rapid progress to the analysis of microscopy images. However, the development, reproducibility, and reuse of AI models relies on having access to useful, annotated datasets. Consequently, developing ways to provide open access to AI datasets adhering to the FAIR principles (Findability, Accessibility, Interoperability and Reusability) would benefit both AI developers and the bioimaging community.

The BioImage Archive (BIA) is EMBL-EBI’s data resource for open storage and distribution of biological images. As part of the Horizon Europe grant AI4Life, the BIA seeks to improve its support for image annotations as part of AI-ready datasets and to develop annotation standards for the community. To this end, we held a virtual workshop with 45 community experts, including data generators, annotators, curators, AI researchers and software developers. The outcomes of the workshop were a series of recommendations on three emerging topics.

The workshop participants provided guidelines on metadata standards for AI image and annotation data. Such metadata needs to encompass: the licence the dataset is under, the annotations’ provenance, the AI models that have been trained with the dataset and spatial, biological, and imaging information. Regarding the best file formats and structures to share the datasets, there was widespread agreement on the use of Next-Generation File Format specifications from the Open Microscopy Environment such as OME-Zarr. Finally, the participants suggested the best ways in which the BIA can present, share, and encourage the submission of AI datasets. This included the possibility to browse data and the creation of an API, to allow direct data query, transformation, and model training. The participants agreed that we need to incentivise the submission of annotated datasets to the BIA. Some strategies involve recognising datasets of exceptional quality and organising training courses to encourage the adoption of the workshop recommendations. We believe that dissemination and adoption of these guidelines among the community will help accelerate the development of AI methods for the analysis of biological images.

References

Return to listing