plantR: An R package and workflow for managing species records from biological collections

Renato A.F. de Lima*, Andrea Sánchez-Tapia, Sara R. Mortara, Hans ter Steege, Marinez F. de Siqueira

*Corresponding author for this work

    Research output: Contribution to JournalArticleAcademicpeer-review

    Abstract

    Species records from biological collections are becoming increasingly available online. This unprecedented availability of records has largely supported recent studies in taxonomy, biogeography, macroecology and biodiversity conservation. Biological collections vary in their documentation and notation standards, which have changed through time. For different reasons, neither collections nor data repositories perform the editing, formatting and standardisation of the data, leaving these tasks to the final users of the species records (e.g. taxonomists, ecologists and conservationists). These tasks are challenging, particularly when working with millions of records from hundreds of biological collections. To help collection curators and final users perform those tasks, we introduce plantR, an open-source package that provides a comprehensive toolbox to manage species records from biological collections. The package is accompanied by the proposal of a reproducible workflow to manage this type of data in taxonomy, ecology and biodiversity conservation. It is implemented in R and designed to handle relatively large datasets as fast as possible. Initially designed to handle plant species records, many of the plantR features also apply to other groups of organisms, given that the data structure is similar. The plantR workflow includes tools to (a) download records from different data repositories, (b) standardise typical fields associated with species records, (c) validate the locality, geographical coordinates, taxonomic nomenclature and species identifications, including the retrieval of duplicates across collections, and (d) summarise and export records, including the construction of species lists with vouchers. Other R packages provide tools to tackle some of the workflow steps described above. But in addition to the new tools and resources related to data standardisation and validation, the greatest strength of plantR is to provide a comprehensive and user-friendly workflow in one single environment, performing all tasks from data retrieval to export. Thus, plantR can help researchers better assess data quality and avoid data leakage in a wide variety of studies using species records.

    Original languageEnglish
    Pages (from-to)332-339
    Number of pages8
    JournalMethods in Ecology and Evolution
    Volume14
    Issue number2
    Early online date2 Dec 2021
    DOIs
    Publication statusPublished - Feb 2023

    Bibliographical note

    Funding Information:
    This package was supported by the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 795114. M.F.d.S., A.S.-T. and S.R.M. were supported by the Coordination for the Improvement of Higher Education Personnel—CAPES (process 88887.145924/2017-00), the PNPD/CAPES program and the PCI program of the ‘Instituto Nacional da Mata Atlântica’ (INMA), respectively. We thank Sidnei Souza from CRIA for his help with the web API. We also thank CNCFlora and the TreeCo database for providing localities used to construct the gazetteer, and Vinícius C. Souza (ESALQ/USP) who helped to curate the list of plant taxonomists. We are also thankful to the Harvard University Herbarium, Brazilian Herbaria Network and the American Society of Plant Taxonomists who were the main sources consulted to build the current list of taxonomists. Finally, we are thankful for the recognition from the 2021 GBIF Ebbe Nielsen Challenge.

    Funding Information:
    This package was supported by the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska‐Curie grant agreement No 795114. M.F.d.S., A.S.‐T. and S.R.M. were supported by the Coordination for the Improvement of Higher Education Personnel—CAPES (process 88887.145924/2017‐00), the PNPD/CAPES program and the PCI program of the ‘Instituto Nacional da Mata Atlântica’ (INMA), respectively. We thank Sidnei Souza from CRIA for his help with the web API. We also thank CNCFlora and the TreeCo database for providing localities used to construct the gazetteer, and Vinícius C. Souza (ESALQ/USP) who helped to curate the list of plant taxonomists. We are also thankful to the Harvard University Herbarium, Brazilian Herbaria Network and the American Society of Plant Taxonomists who were the main sources consulted to build the current list of taxonomists. Finally, we are thankful for the recognition from the 2021 GBIF Ebbe Nielsen Challenge.

    Publisher Copyright:
    © 2021 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society.

    Funding

    This package was supported by the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 795114. M.F.d.S., A.S.-T. and S.R.M. were supported by the Coordination for the Improvement of Higher Education Personnel—CAPES (process 88887.145924/2017-00), the PNPD/CAPES program and the PCI program of the ‘Instituto Nacional da Mata Atlântica’ (INMA), respectively. We thank Sidnei Souza from CRIA for his help with the web API. We also thank CNCFlora and the TreeCo database for providing localities used to construct the gazetteer, and Vinícius C. Souza (ESALQ/USP) who helped to curate the list of plant taxonomists. We are also thankful to the Harvard University Herbarium, Brazilian Herbaria Network and the American Society of Plant Taxonomists who were the main sources consulted to build the current list of taxonomists. Finally, we are thankful for the recognition from the 2021 GBIF Ebbe Nielsen Challenge. This package was supported by the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska‐Curie grant agreement No 795114. M.F.d.S., A.S.‐T. and S.R.M. were supported by the Coordination for the Improvement of Higher Education Personnel—CAPES (process 88887.145924/2017‐00), the PNPD/CAPES program and the PCI program of the ‘Instituto Nacional da Mata Atlântica’ (INMA), respectively. We thank Sidnei Souza from CRIA for his help with the web API. We also thank CNCFlora and the TreeCo database for providing localities used to construct the gazetteer, and Vinícius C. Souza (ESALQ/USP) who helped to curate the list of plant taxonomists. We are also thankful to the Harvard University Herbarium, Brazilian Herbaria Network and the American Society of Plant Taxonomists who were the main sources consulted to build the current list of taxonomists. Finally, we are thankful for the recognition from the 2021 GBIF Ebbe Nielsen Challenge.

    Keywords

    • biodiversity
    • data cleaning
    • data download
    • duplicate records
    • gazetteer
    • GBIF
    • herbarium
    • taxonomic validation

    Fingerprint

    Dive into the research topics of 'plantR: An R package and workflow for managing species records from biological collections'. Together they form a unique fingerprint.

    Cite this