Towards FAIR AI: Providing open access to high-quality annotated biological images on the BioImage Archive
- Abstract number
- 127
- Presentation Form
- Contributed Talk
- DOI
- 10.22443/rms.mmc2023.127
- Corresponding Email
- [email protected]
- Session
- Reproducibility of Data Analysis at Scale
- Authors
- Teresa Zulueta-Coarasa (1), Matthew Hartley (1)
- Affiliations
-
1. European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI
- Keywords
AI, microscopy, imaging, FAIR, open
- Abstract text
Modern Artificial Intelligence (AI) techniques have brought rapid progress to the analysis of microscopy images. However, the development, reproducibility, and reuse of AI models relies on having access to useful, annotated datasets. Consequently, developing ways to provide open access to AI datasets adhering to the FAIR principles (Findability, Accessibility, Interoperability and Reusability) would benefit both AI developers and the bioimaging community.
The BioImage Archive (BIA) is EMBL-EBI’s data resource for open storage and distribution of biological images. As part of the Horizon Europe grant AI4Life, the BIA seeks to improve its support for image annotations as part of AI-ready datasets and to develop annotation standards for the community. To this end, we held a virtual workshop with 45 community experts, including data generators, annotators, curators, AI researchers and software developers. The outcomes of the workshop were a series of recommendations on three emerging topics.
The workshop participants provided guidelines on metadata standards for AI image and annotation data. Such metadata needs to encompass: the licence the dataset is under, the annotations’ provenance, the AI models that have been trained with the dataset and spatial, biological, and imaging information. Regarding the best file formats and structures to share the datasets, there was widespread agreement on the use of Next-Generation File Format specifications from the Open Microscopy Environment such as OME-Zarr. Finally, the participants suggested the best ways in which the BIA can present, share, and encourage the submission of AI datasets. This included the possibility to browse data and the creation of an API, to allow direct data query, transformation, and model training. The participants agreed that we need to incentivise the submission of annotated datasets to the BIA. Some strategies involve recognising datasets of exceptional quality and organising training courses to encourage the adoption of the workshop recommendations. We believe that dissemination and adoption of these guidelines among the community will help accelerate the development of AI methods for the analysis of biological images.
- References