Endless Forams: >34,000 Modern Planktonic Foraminiferal Images for Taxonomic Training and Automated Species Recognition Using Convolutional Neural Networks

Allison Y. Hsiang*, Anieke Brombacher, Marina C. Rillo, Maryline J. Mleneck-Vautravers, Stephen Conn, Sian Lordsmith, Anna Jentzen, Michael J. Henehan, Brett Metcalfe, Isabel S. Fenton, Bridget S. Wade, Lyndsey Fox, Julie Meilland, Catherine V. Davis, Ulrike Baranowski, Jeroen Groeneveld, Kirsty M. Edgar, Aurore Movellan, Tracy Aze, Harry J. DowsettC. Giles Miller, Nelson Rios, Pincelli M. Hull

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Planktonic foraminiferal species identification is central to many paleoceanographic studies, from selecting species for geochemical research to elucidating the biotic dynamics of microfossil communities relevant to physical oceanographic processes and interconnected phenomena such as climate change. However, few resources exist to train students in the difficult task of discerning amongst closely related species, resulting in diverging taxonomic schools that differ in species concepts and boundaries. This problem is exacerbated by the limited number of taxonomic experts. Here we document our initial progress toward removing these confounding and/or rate-limiting factors by generating the first extensive image library of modern planktonic foraminifera, providing digital taxonomic training tools and resources, and automating species-level taxonomic identification of planktonic foraminifera via machine learning using convolution neural networks. Experts identified 34,640 images of modern (extant) planktonic foraminifera to the species level. These images are served as species exemplars through the online portal Endless Forams (endlessforams.org) and a taxonomic training portal hosted on the citizen science platform Zooniverse (zooniverse.org/projects/ahsiang/endless-forams/). A supervised machine learning classifier was then trained with ~27,000 images of these identified planktonic foraminifera. The best-performing model provided the correct species name for an image in the validation set 87.4% of the time and included the correct name in its top three guesses 97.7% of the time. Together, these resources provide a rigorous set of training tools in modern planktonic foraminiferal taxonomy and a means of rapidly generating assemblage data via machine learning in future studies for applications such as paleotemperature reconstruction.

Original languageEnglish
Pages (from-to)1157-1177
Number of pages21
JournalPaleoceanography and Paleoclimatology
Volume34
Issue number7
Early online date23 Jun 2019
DOIs
Publication statusPublished - Jul 2019

Funding

We would like to thank David Rossman and Kaylea Nelson for IT support and ongoing Yale Research computing support; Stephen Bach and Steve Yadlowsky for their advice on hardware recommendations for the Beella machine and discussion of machine learning methods; Miroslav Valan for their discussions, advice, and comments on optimizing CNN performance and providing code examples; Ben Taylor, Janet Burke, Luke Strotz, and Jennifer Fehrenbacher for their contributions to the Zooniverse platform; Tyler Schon for website development and support for Endless Forams database (endlessforams.org); and David Smith for his support on globally updating some of the Buckley Collection records. We also thank Liz Sikes, Jennifer Hertzberg, and one anonymous reviewer for their comments and suggestions improving the manuscript. H. J. D. appreciates the continued support of the U.S. Geological Survey Land Change Science Program.?P. M. H. was supported by the American Chemistry Society Petroleum Research Fund (PRF no. 55837-DNI8).?S. C. and S. L. were funded by Natural Environment Research Council grant #NE/L006405/1. B. S. W. was supported by Natural Environment Research Council grant #NE/P019013/1.

FundersFunder number
U.S. Geological Survey Land Change Science Program
American Chemical Society Petroleum Research Fund55837-DNI8
Natural Environment Research Council/P019013/1, /L006405/1

    Keywords

    • convolutional neural networks
    • global community macroecology
    • marine microfossils
    • planktonic foraminifera
    • species identification
    • supervised machine learning

    Fingerprint

    Dive into the research topics of 'Endless Forams: >34,000 Modern Planktonic Foraminiferal Images for Taxonomic Training and Automated Species Recognition Using Convolutional Neural Networks'. Together they form a unique fingerprint.

    Cite this