Matches and compares two name lists based on the taxonomic resolution of plant taxa names listed in the "Leipzig Catalogue of Vascular Plants" (LCVP).

lcvp_match(
  splist1,
  splist2,
  max_distance = 0.2,
  genus_fuzzy = FALSE,
  grammar_check = FALSE,
  include_all = TRUE,
  identify_dups = TRUE
)

Arguments

splist1

A character vector specifying the reference input taxon to be matched. Each element including genus and specific epithet and, potentially, infraspecific rank, infraspecific name, and author name. Only valid characters are allowed (see base:validEnc).

splist2

A character vector specifying the input taxon to match splist1. Each element including genus and specific epithet and, potentially, infraspecific rank, infraspecific name, and author name. Only valid characters are allowed (see base:validEnc).

max_distance

It represents the maximum string distance allowed for a match when comparing the submitted name with the closest name matches in the LCVP. The distance used is a generalized Levenshtein distance that indicates the total number of insertions, deletions, and substitutions allowed to match the two names. It can be expressed as an integer or as the fraction of the binomial name. For example, a name with length 10, and a max_distance = 0.1, allow only one change (insertion, deletion, or substitution). A max_distance = 2, allows two changes.

genus_fuzzy

If TRUE, the fuzzy match algorithm based on max_distance will also be applied to the genus (note that this may considerably increase computational time). If FALSE, fuzzy match will only apply to the epithet.

grammar_check

if TRUE, the algorithm will try to fix common latin grammar mistakes.

include_all

If TRUE (default), it will include all species in both splist1 and splist2. If FALSE, it will exclude species only found in splist2.

identify_dups

If TRUE (default), a column indicating the position of duplicated LCVP output names in the resulting data.frame.

Value

A data.frame with the following columns:

  • Species.List.1: Taxa name list provided by the user in the splist1.

  • Species.List.2: Taxa name list provided by the user in the splist2.

  • global.IdThe fixed species id of the input taxon in the Leipzig Catalogue of Vascular Plants (LCVP).

  • Input.GenusA character vector. The input genus of the corresponding vascular plant species name listed in LCVP.

  • Input.EpithetonA character vector. The input epitheton of the corresponding vascular plant species name listed in LCVP.

  • RankA character vector. The taxonomic rank ("species", subspecies: "subsp.", variety: "var.", subvariety: "subvar.", "forma", or subforma: "subf.") of the corresponding vascular plant species name listed in LCVP.

  • Input.Subspecies.EpithetonA character vector. If the indicated rank is below species, the subspecies epitheton input of the corresponding vascular plant species name listed in LCVP. If the rank is "species", the input is "nil".

  • Input.AuthorsA character vector. The taxonomic authority input of the corresponding vascular plant species name listed in LCVP.

  • StatusA character vector. description if a taxon is classified as ‘valid’, ‘synonym’, ‘unresolved’, ‘external’ or ‘blanks’. The ‘unresolved’ rank means that the status of the plant name could be either valid or synonym, but the information available does not allow a definitive decision. ‘External’ in an extra rank which lists names outside the scope of this publication but useful to keep on this updated list. ‘Blanks’ means that the respective name exists in bibliography but it is neither clear where it came from valid, synonym or unresolved. (see the main text Freiberg et al. for more details)

  • globalId.of.Output.TaxonThe fixed species id of the output taxon in LCVP.

  • Output.TaxonA character vector. The list of the accepted plant taxa names according to the LCVP.

  • FamilyA character vector. The corresponding family name of the Input.Taxon, staying empty if the Status is unresolved.

  • OrderA character vector. The corresponding order name of the Input.Taxon, staying empty if the Status is unresolved.

  • LiteratureA character vector. The bibliography used.

  • CommentsA character vector. Further taxonomic comments.

  • Match.Position.2to1: positions of the names in splist1 in splist2. Can be used to reorder splist2 to match splist1.

  • Duplicated.Output.Position: If identify_dups = TRUE, it indicates the position of duplicated names in LCVP.Output.Taxon column. This may occur if two inputs are now synonyms. It will output NA if there is no duplicated for the species name.

See LCVP:tab_lcvp for more details.

If include_all = TRUE, all species will be included. Ordered based on the splist1, and followed by non-matched names in splist2. If include_all = FALSE, non-matched names in splist2 are not included.

References

Freiberg, M., Winter, M., Gentile, A. et al. LCVP, The Leipzig catalogue of vascular plants, a new taxonomic reference list for all known vascular plants. Sci Data 7, 416 (2020). https://doi.org/10.1038/s41597-020-00702-z

See also

Author

Bruno Vilela & Alexander Ziska

Examples

# Ensure that LCVP package is available before running the example.
# If it is not, see the `lcvplants` package vignette for details
# on installing the required data package.
if (requireNamespace("LCVP", quietly = TRUE)) { # Do not run this

# Generate two lists of species name
splist1 <- sample(apply(LCVP::tab_lcvp[2:10, 2:3], 1, paste, collapse = " "))
splist2 <- sample(apply(LCVP::tab_lcvp[11:3, 2:3], 1, paste, collapse = " "))

# Including all species in both lists
lcvp_match(splist1, splist2, include_all = TRUE)

# Including all species only in the first list
matchLists <- lcvp_match(splist1, splist2, include_all = FALSE)
## This can be used to quickly change positions in splist2 to match splist1
splist2[matchLists$Match.Position.2to1]

}