Endless Forams: >34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks

Allison Y. Hsiang, Anieke Brombacher, Marina Costa Rillo, Maryline J. Mleneck-Vautravers, Stephen Conn, Sian Lordsmith, Anna Jentzen, Michael J. Henehan, Brett Metcalfe, Isabel Fenton, Bridget S. Wade, Lyndsey Fox, Julie Meilland, Catherine V. Davis, Ulrike Baranowski, Jeroen Groeneveld, Kirsty M. Edgar, Aurore Movellan, Tracy Aze, Harry J. Dowsett & 3 others C. Giles Miller, Nelson Rios, Pincelli M. Hull

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

ABSTRACT Planktonic foraminiferal species identification is central to many paleoceanographic studies, from selecting species for geochemical research to elucidating the biotic dynamics of microfossil communities relevant to physical oceanographic processes and interconnected phenomena such as climate change. However, few resources exist to train students in the difficult task of discerning amongst closely related species, resulting in diverging taxonomic schools that differ in species concepts and boundaries. This problem is exacerbated by the limited number of taxonomic experts. Here, we document our initial progress towards removing these confounding and/or rate-limiting factors by generating the first extensive image library of modern planktonic foraminifera, providing digital taxonomic training tools and resources, and automating species-level taxonomic identification of planktonic foraminifera via machine learning using convolution neural networks. Experts identified 34,640 images of modern (extant) planktonic foraminifera to the species level. These images are served as species exemplars through the online portal Endless Forams (endlessforams.org) and a taxonomic training portal hosted on the citizen science platform Zooniverse (zooniverse.org/projects/ahsiang/endless-forams/). A supervised machine learning classifier was then trained with ~27,000 images of these identified planktonic foraminifera. The best-performing model provided the correct species name for an image in the validation set 87.4% of the time, and included the correct name in its top three guesses 97.7% of the time. Together, these resources provide a rigorous set of training tools in modern planktonic foraminiferal taxonomy and a means of rapidly generating assemblage data via machine learning in future studies for applications such as paleotemperature reconstruction and salinity indicator counting.
Original languageEnglish
JournalPaleoceanography and Paleoclimatology
DOIs
Publication statusPublished - 23 Jun 2019

Fingerprint

planktonic foraminifera
resource
species concept
paleotemperature
microfossil
limiting factor
train
student
salinity
climate change
machine learning

Bibliographical note

doi: 10.1029/2019PA003612

Keywords

  • Planktonic foraminifera
  • global community macroecology
  • supervised machine learning
  • convolutional neural networks
  • marine microfossils
  • species identification

Cite this

Hsiang, Allison Y. ; Brombacher, Anieke ; Rillo, Marina Costa ; Mleneck-Vautravers, Maryline J. ; Conn, Stephen ; Lordsmith, Sian ; Jentzen, Anna ; Henehan, Michael J. ; Metcalfe, Brett ; Fenton, Isabel ; Wade, Bridget S. ; Fox, Lyndsey ; Meilland, Julie ; Davis, Catherine V. ; Baranowski, Ulrike ; Groeneveld, Jeroen ; Edgar, Kirsty M. ; Movellan, Aurore ; Aze, Tracy ; Dowsett, Harry J. ; Miller, C. Giles ; Rios, Nelson ; Hull, Pincelli M. / Endless Forams: >34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks. In: Paleoceanography and Paleoclimatology. 2019.
@article{237287a514054536b91ce079f8fe82c1,
title = "Endless Forams: >34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks",
abstract = "ABSTRACT Planktonic foraminiferal species identification is central to many paleoceanographic studies, from selecting species for geochemical research to elucidating the biotic dynamics of microfossil communities relevant to physical oceanographic processes and interconnected phenomena such as climate change. However, few resources exist to train students in the difficult task of discerning amongst closely related species, resulting in diverging taxonomic schools that differ in species concepts and boundaries. This problem is exacerbated by the limited number of taxonomic experts. Here, we document our initial progress towards removing these confounding and/or rate-limiting factors by generating the first extensive image library of modern planktonic foraminifera, providing digital taxonomic training tools and resources, and automating species-level taxonomic identification of planktonic foraminifera via machine learning using convolution neural networks. Experts identified 34,640 images of modern (extant) planktonic foraminifera to the species level. These images are served as species exemplars through the online portal Endless Forams (endlessforams.org) and a taxonomic training portal hosted on the citizen science platform Zooniverse (zooniverse.org/projects/ahsiang/endless-forams/). A supervised machine learning classifier was then trained with ~27,000 images of these identified planktonic foraminifera. The best-performing model provided the correct species name for an image in the validation set 87.4{\%} of the time, and included the correct name in its top three guesses 97.7{\%} of the time. Together, these resources provide a rigorous set of training tools in modern planktonic foraminiferal taxonomy and a means of rapidly generating assemblage data via machine learning in future studies for applications such as paleotemperature reconstruction and salinity indicator counting.",
keywords = "Planktonic foraminifera, global community macroecology, supervised machine learning, convolutional neural networks, marine microfossils, species identification",
author = "Hsiang, {Allison Y.} and Anieke Brombacher and Rillo, {Marina Costa} and Mleneck-Vautravers, {Maryline J.} and Stephen Conn and Sian Lordsmith and Anna Jentzen and Henehan, {Michael J.} and Brett Metcalfe and Isabel Fenton and Wade, {Bridget S.} and Lyndsey Fox and Julie Meilland and Davis, {Catherine V.} and Ulrike Baranowski and Jeroen Groeneveld and Edgar, {Kirsty M.} and Aurore Movellan and Tracy Aze and Dowsett, {Harry J.} and Miller, {C. Giles} and Nelson Rios and Hull, {Pincelli M.}",
note = "doi: 10.1029/2019PA003612",
year = "2019",
month = "6",
day = "23",
doi = "10.1029/2019PA003612",
language = "English",
journal = "Paleoceanography and Paleoclimatology",
issn = "2572-4517",
publisher = "John Wiley & Sons, Ltd",

}

Hsiang, AY, Brombacher, A, Rillo, MC, Mleneck-Vautravers, MJ, Conn, S, Lordsmith, S, Jentzen, A, Henehan, MJ, Metcalfe, B, Fenton, I, Wade, BS, Fox, L, Meilland, J, Davis, CV, Baranowski, U, Groeneveld, J, Edgar, KM, Movellan, A, Aze, T, Dowsett, HJ, Miller, CG, Rios, N & Hull, PM 2019, 'Endless Forams: >34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks' Paleoceanography and Paleoclimatology. https://doi.org/10.1029/2019PA003612

Endless Forams: >34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks. / Hsiang, Allison Y.; Brombacher, Anieke; Rillo, Marina Costa; Mleneck-Vautravers, Maryline J.; Conn, Stephen; Lordsmith, Sian; Jentzen, Anna; Henehan, Michael J.; Metcalfe, Brett; Fenton, Isabel; Wade, Bridget S.; Fox, Lyndsey; Meilland, Julie; Davis, Catherine V.; Baranowski, Ulrike; Groeneveld, Jeroen; Edgar, Kirsty M.; Movellan, Aurore; Aze, Tracy; Dowsett, Harry J.; Miller, C. Giles; Rios, Nelson; Hull, Pincelli M.

In: Paleoceanography and Paleoclimatology, 23.06.2019.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Endless Forams: >34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks

AU - Hsiang, Allison Y.

AU - Brombacher, Anieke

AU - Rillo, Marina Costa

AU - Mleneck-Vautravers, Maryline J.

AU - Conn, Stephen

AU - Lordsmith, Sian

AU - Jentzen, Anna

AU - Henehan, Michael J.

AU - Metcalfe, Brett

AU - Fenton, Isabel

AU - Wade, Bridget S.

AU - Fox, Lyndsey

AU - Meilland, Julie

AU - Davis, Catherine V.

AU - Baranowski, Ulrike

AU - Groeneveld, Jeroen

AU - Edgar, Kirsty M.

AU - Movellan, Aurore

AU - Aze, Tracy

AU - Dowsett, Harry J.

AU - Miller, C. Giles

AU - Rios, Nelson

AU - Hull, Pincelli M.

N1 - doi: 10.1029/2019PA003612

PY - 2019/6/23

Y1 - 2019/6/23

N2 - ABSTRACT Planktonic foraminiferal species identification is central to many paleoceanographic studies, from selecting species for geochemical research to elucidating the biotic dynamics of microfossil communities relevant to physical oceanographic processes and interconnected phenomena such as climate change. However, few resources exist to train students in the difficult task of discerning amongst closely related species, resulting in diverging taxonomic schools that differ in species concepts and boundaries. This problem is exacerbated by the limited number of taxonomic experts. Here, we document our initial progress towards removing these confounding and/or rate-limiting factors by generating the first extensive image library of modern planktonic foraminifera, providing digital taxonomic training tools and resources, and automating species-level taxonomic identification of planktonic foraminifera via machine learning using convolution neural networks. Experts identified 34,640 images of modern (extant) planktonic foraminifera to the species level. These images are served as species exemplars through the online portal Endless Forams (endlessforams.org) and a taxonomic training portal hosted on the citizen science platform Zooniverse (zooniverse.org/projects/ahsiang/endless-forams/). A supervised machine learning classifier was then trained with ~27,000 images of these identified planktonic foraminifera. The best-performing model provided the correct species name for an image in the validation set 87.4% of the time, and included the correct name in its top three guesses 97.7% of the time. Together, these resources provide a rigorous set of training tools in modern planktonic foraminiferal taxonomy and a means of rapidly generating assemblage data via machine learning in future studies for applications such as paleotemperature reconstruction and salinity indicator counting.

AB - ABSTRACT Planktonic foraminiferal species identification is central to many paleoceanographic studies, from selecting species for geochemical research to elucidating the biotic dynamics of microfossil communities relevant to physical oceanographic processes and interconnected phenomena such as climate change. However, few resources exist to train students in the difficult task of discerning amongst closely related species, resulting in diverging taxonomic schools that differ in species concepts and boundaries. This problem is exacerbated by the limited number of taxonomic experts. Here, we document our initial progress towards removing these confounding and/or rate-limiting factors by generating the first extensive image library of modern planktonic foraminifera, providing digital taxonomic training tools and resources, and automating species-level taxonomic identification of planktonic foraminifera via machine learning using convolution neural networks. Experts identified 34,640 images of modern (extant) planktonic foraminifera to the species level. These images are served as species exemplars through the online portal Endless Forams (endlessforams.org) and a taxonomic training portal hosted on the citizen science platform Zooniverse (zooniverse.org/projects/ahsiang/endless-forams/). A supervised machine learning classifier was then trained with ~27,000 images of these identified planktonic foraminifera. The best-performing model provided the correct species name for an image in the validation set 87.4% of the time, and included the correct name in its top three guesses 97.7% of the time. Together, these resources provide a rigorous set of training tools in modern planktonic foraminiferal taxonomy and a means of rapidly generating assemblage data via machine learning in future studies for applications such as paleotemperature reconstruction and salinity indicator counting.

KW - Planktonic foraminifera

KW - global community macroecology

KW - supervised machine learning

KW - convolutional neural networks

KW - marine microfossils

KW - species identification

U2 - 10.1029/2019PA003612

DO - 10.1029/2019PA003612

M3 - Article

JO - Paleoceanography and Paleoclimatology

JF - Paleoceanography and Paleoclimatology

SN - 2572-4517

ER -