vignettes/taxonomic_resolution_using_lcplants.Rmd
taxonomic_resolution_using_lcplants.Rmd
When comparing or merging any list of plant species, for instance for compiling regional species lists, merging occurrence lists with trait data or phylogenies the taxonomic names must be matched to avoid artificial inflation due to synonyms, data loss or erroneous matches due to homonyms.
The Leipzig Catalogue of Vascular Plants (LCVP) is a novel global taxonomic backbone, updating The Plant List, comprising more than 1,300,000 names and 350,000 accepted taxa names. We described the LCVP in detail in the related scientific publication (NOT YET PUBLISHED).
A package for large-scale taxonomic harmonization of plant names by fuzzy matching and synonymy resolution against the Leipzig Catalogue of Vascular Plants as taxonomic backbone. Submission of single names or list of species names is possible (for lists with more than 5,000 species computation may take some time).
You can install lcvplants
from github using the devtools package (you may need to install devtools first). To use lcvplants
you also need the data of the LCVP
package, which you can install in the same way.
library(devtools) devtools::install_github("idiv-biodiversity/lcvplants") devtools::install_github("idiv-biodiversity/LCVP") library(lcvplants)
All you need to run lcvplants is a species name or a list of species names for which you need the status or synonymy. The genus name and epitheton should not contain special characters, the name of the authorities might contain special characters.
The main function of lcvplants is LCVP
. The function taxes your names(s) as input argument and will match it against the Leipzig plant Catalogue. LCVP uses fuzzy matching, so that it can identify the accepted name of misspelled species as well. The LCVP
function has some options to customize the fuzzy matching and the output, which are described below. You can get help on all arguments using ?LCVP
. To run a basic analyses, all you need to do is to call LCVP
on your species name. By default, LCVP
uses direct matches only; see below if you want to use fuzzy matching.
LCVP("Hibiscus vitifolius")
## serial path, for searching a single taxon
## Search based on genus and epithet name
## Time difference of 5.072484 secs
## ---------------------------------------------------------
## - End of Leipzig Catalogue of Plants (LCVP) search -
## ---------------------------------------------------------
## ID Submitted_Name Order Family Genus Species Infrasp
## 1 1 Hibiscus vitifolius Malvales Malvaceae Hibiscus vitifolius species
## 2 1 Hibiscus vitifolius Malvales Malvaceae Hibiscus vitifolius species
## Infraspecies Authors Status LCVP_Accepted_Taxon PL_Comparison
## 1 L. accepted Hibiscus vitifolius L. identical
## 2 Mill. synonym Hibiscus cannabinus L. identical
## PL_Alternative Score Insertion Deletion Substitution
## 1 matched 0 0 0
## 2 matched 0 0 0
The name may (and ideally should) also include the authorities.
LCVP("Hibiscus abelmoschus var. betulifolius Mast.")
LCVP
also works on vectors of names. With more than 5,000 names, computation may take some time.
LCVP(c("Hibiscus abelmoschus var. betulifolius Mast.", "Hibiscus abutiloides Willd.", "Hibiscus aculeatus", "Hibiscus acuminatus"))
In case you have a data.frame
, for instance including a cloumn with the species names (“species”), you can use LCVP
as follows:
dat <- data.frame(species = c("Hibiscus abelmoschus var. betulifolius Mast.", "Hibiscus abutiloides Willd.", "Hibiscus aculeatus", "Hibiscus acuminatus"), additional.colum1 = runif(4, min = -180, max = 180), additional.colum2 = runif(4, min = -90, max = 90)) LCVP(dat$species)
You can also pick a .csv file from your hard disk.
LCVP("list")
The output value of LCVP
is a data.frame
with the matching information for your species names. This includes the status of the submitted name (“valid”, “synonym”, “invalid”), in case of synonyms the matched accepted name in LCVP, a comparison to the status of the matched name in The Plant List and information on the fuzzy matching (e.g. in cases your name was misspelled, information by how match so).
Output column name | Explanation |
---|---|
ID | A unique identifier of the matched name |
Submitted_Name | The submitted name |
Order | The matched taxonomic order for the submitted name |
Family | The matched taxonomic family |
Genus | The matched genus |
Species | The matched epitheton |
Infrasp | The infra-specific rank, in case the matched name contains a variety or subspecies name |
Infraspecies | The infra-specific name if applicable |
Authors | The authorities of the matched name |
LCVP_Accepted_Taxon | The accepted name of the matched name in LCVP |
PL_Comparison | The status of a comparison to The Plant List, i.e. is the accepted name the same in TPL? |
PL_Alternative | In case the accepted name is different in TPL, the accepted name in TPL |
Score | Summary of the fuzzy matching |
Insertion | In case the matching was not perfect, the number of inserted characters in the matched name |
Deletion | In case the fuzzy match was not perfect, the number of deleted characters |
Substitution | In case the fuzzy match was not perfect, the number of substituted characters |
There are several arguments to customize the matching and output of LCVP
. See ?LCVP
for a description.
In case you expect misspellings in your names list (very likely, at least for longer lists), you might want to use fuzzy matching for the names resolution. If fuzzy matching is activated, LCVP
will also match names that differ in spelling from the submitted name. You can use the max_distance
argument to define in how many positions the matched names are allowed to differ from the submitted name. This is especially relevant if your names include authorities, since they often differ in spelling (e.g., if spaces are used or not). One or two might be reasonable values here. Increases in max_distance
will increase the computation time for the name resolution. In any case, it is advisable to check the results of fuzzy matching since there might be accepted names that only differ by few characters.
# no fuzzy matching does not find misspelled names fuzz <- LCVP("Hibiscus vitifolios")
## Time difference of 26.65401 secs
fuzz$Score
## [1] "Epithet name not found"
# fuzzy matching does find it fuzz <- LCVP("Hibiscus vitifolios", max_distance = 1)
## Time difference of 34.06915 secs
fuzz$Score
## [1] "misspelling: Epithet" "misspelling: Epithet"
#Also works for larger distances fuzz <- LCVP("Hibiscus vitifulios", max_distance = 2)
## Time difference of 39.58281 secs
fuzz$Score
## [1] "misspelling: Epithet" "misspelling: Epithet"
# But results become less reliable with larger distances fuzz <- LCVP("Hibiscus acetosulla", max_distance = 5)
## Time difference of 41.49436 secs
fuzz
## ID Submitted_Name Order Family Genus Species Infrasp
## 1 1 Hibiscus acetosulla Malvales Malvaceae Hibiscus acetosella species
## 2 1 Hibiscus acetosulla Malvales Malvaceae Hibiscus acicularis species
## Infraspecies Authors Status LCVP_Accepted_Taxon
## 1 Welw. ex Hiern accepted Hibiscus acetosella Welw. ex Hiern
## 2 Standl. accepted Hibiscus acicularis Standl.
## PL_Comparison PL_Alternative Score Insertion Deletion
## 1 identical misspelling: Epithet 0 0
## 2 identical misspelling: Epithet 3 3
## Substitution
## 1 1
## 2 2
By default, LCVP
will only use fuzzy matching for species epitheta, infra-specific names and the authorities. If you want to include the genus name, use the genus_search
argument.
# no fuzzy matching does not find misspelled names fuzz <- LCVP("Hubiscus vitifolius")
## Time difference of 0.07080913 secs
fuzz$Score
## [1] "Genus name not found"
# fuzzy matching does find it fuzz <- LCVP("Hubiscus vitifolius", max_distance = 1, genus_search = TRUE)
## Time difference of 33.82916 secs
fuzz$Score
## [1] "misspelling: Genus" "misspelling: Genus"
The name resolution can take a while to compute, especially for long lists and high matching distances (can be up to 30 minutes). To speed up the process, you can use multiple cores if available using the max.cores
argument. By default, LCVP
will use all available cores but one.
By default, LCVP
only returns accepted names, but you can use the LCVP
function to obtain all synonyms for your names of interest using the status
argument.
LCVP("Hibiscus vitifolius", status = FALSE)
## serial path, for searching a single taxon
## Search based on genus and epithet name
## Time difference of 0.3191488 secs
## ---------------------------------------------------------
## - End of Leipzig Catalogue of Plants (LCVP) search -
## ---------------------------------------------------------
## ID Submitted_Name Order Family Genus Species Infrasp
## 1 1 Hibiscus vitifolius Malvales Malvaceae Hibiscus vitifolius species
## 2 1 Hibiscus vitifolius Malvales Malvaceae Hibiscus vitifolius species
## Infraspecies Authors Status LCVP_Accepted_Taxon PL_Comparison
## 1 L. accepted Hibiscus vitifolius L. identical
## 2 Mill. synonym Hibiscus cannabinus L. identical
## PL_Alternative Score Insertion Deletion Substitution
## 1 matched 0 0 0
## 2 matched 0 0 0
You can also use the LCVP
function to get a list of all species names within a genus in the Leipzig Catalogue of Vascular Plants.
LCVP("Hibiscus", genus_tab = TRUE)
Similarly, you can use the LCVP
function to obtain all infra-specific namers of a species
LCVP("Hibiscus vitifolius", infraspecies_tab = TRUE)