R/lcvp_fuzzy_search.R
lcvp_fuzzy_search.Rd
Same as lcvp_search
, but it returns
all matches from a fuzzy search of plant taxa names listed in the "Leipzig
Catalogue of Vascular Plants" (LCVP).
lcvp_fuzzy_search(
splist,
max_distance = 0.2,
genus_fuzzy = FALSE,
status = c("accepted", "synonym", "unresolved", "external"),
bind_result = TRUE,
keep_closest = TRUE,
progress_bar = FALSE
)
A character vector specifying the input taxon, each element
including genus and specific epithet and, potentially, infraspecific rank,
infraspecific name and author name. Only valid characters are allowed
(see validEnc
).
It represents the maximum string distance allowed for a match when comparing the submitted name with the closest name matches in the LCVP. The distance used is a generalized Levenshtein distance that indicates the total number of insertions, deletions, and substitutions allowed to match the two names. It can be expressed as an integer or as the fraction of the binomial name. For example, a name with length 10, and a max_distance = 0.1, allow only one change (insertion, deletion, or substitution). A max_distance = 2, allows two changes.
If TRUE, the fuzzy match algorithm based on max_distance will also be applied to the genus (note that this may considerably increase computational time). If FALSE, fuzzy match will only apply to the epithet.
A character vector indicating what taxa status should be included in the results: "accepted", "synonym", "unresolved", "external".
The "unresolved" rank means that the status of the plant name could be either valid or synonym, but the information available does not allow a definitive decision. "external" is an extra rank that lists names outside the scope of this publication but useful to keep on this updated list.
If TRUE the function will return one data.frame (default). If False, the function will return a list of separate data.frames for each input group.
if TRUE the function will return only the closest names within the max_distance specified. If FALSE, it will return all names within the specified distance.
If TRUE, a progress bar will be printed.
A data.frame or a list of data.frames (if bind_result = FALSE
)
with the following columns:
global.IdThe fixed species id of the input taxon in the Leipzig Catalogue of Vascular Plants (LCVP).
Input.GenusA character vector. The input genus of the corresponding vascular plant species name listed in LCVP.
Input.EpithetonA character vector. The input epitheton of the corresponding vascular plant species name listed in LCVP.
RankA character vector. The taxonomic rank ("species", subspecies: "subsp.", variety: "var.", subvariety: "subvar.", "forma", or subforma: "subf.") of the corresponding vascular plant species name listed in LCVP.
Input.Subspecies.EpithetonA character vector. If the indicated rank is below species, the subspecies epitheton input of the corresponding vascular plant species name listed in LCVP. If the rank is "species", the input is "nil".
Input.AuthorsA character vector. The taxonomic authority input of the corresponding vascular plant species name listed in LCVP.
StatusA character vector. description if a taxon is classified as ‘valid’, ‘synonym’, ‘unresolved’, ‘external’ or ‘blanks’. The ‘unresolved’ rank means that the status of the plant name could be either valid or synonym, but the information available does not allow a definitive decision. ‘External’ in an extra rank which lists names outside the scope of this publication but useful to keep on this updated list. ‘Blanks’ means that the respective name exists in bibliography but it is neither clear where it came from valid, synonym or unresolved. (see the main text Freiberg et al. for more details)
globalId.of.Output.TaxonThe fixed species id of the output taxon in LCVP.
Output.TaxonA character vector. The list of the accepted plant taxa names according to the LCVP.
FamilyA character vector. The corresponding family name of the Input.Taxon, staying empty if the Status is unresolved.
OrderA character vector. The corresponding order name of the Input.Taxon, staying empty if the Status is unresolved.
LiteratureA character vector. The bibliography used.
CommentsA character vector. Further taxonomic comments.
Name.DistanceThe approximate string distance between the Search
and matched Input.Taxon names. See utils:adist
for more details.
See LCVP::tab_lcvp
for more details.
If no match is found for one species it will return NA for the columns in the LCVP table. But, if no match is found for all species the function will return NULL and a warning message.
The algorithm will look for all the names within the given maximum distance
defined in max_distance
. It can return all best matches (keep_closest =
TRUE), or all the matches within the distance (keep_closest = FALSE).
Note that only binomial names with valid characters are allowed in this
function. Search based on genus, family, order or author names should use
the function lcvp_group_search
.
Freiberg, M., Winter, M., Gentile, A. et al. LCVP, The Leipzig catalogue of vascular plants, a new taxonomic reference list for all known vascular plants. Sci Data 7, 416 (2020). https://doi.org/10.1038/s41597-020-00702-z
# Ensure that LCVP package is available before running the example.
# If it is not, see the `lcvplants` package vignette for details
# on installing the required data package.
if (requireNamespace("LCVP", quietly = TRUE)) { # Do not run this
# Returns a data.frame
lcvp_fuzzy_search(c("Hibiscus vitifolia", "Artemisia vulgaris"))
# Returns a list of data.frames
lcvp_fuzzy_search(c("Hibiscus vitifolia", "Artemisia vulgaris"),
bind_result = FALSE)
# Returns all accepted names within a max_distance of 6.
lcvp_fuzzy_search("Hibiscus vitifolia", status = "accepted",
keep_closest = FALSE, max_distance = 6)
}