Background

Sampling effort is a critical measure for the reliability of present/absence projections from species occurrence data. However, dataset from publicly available databases based on biological collections are often comprised from various sources and compiled over large time periods and information on sampling effort is therefore most of the time unavailable. Physical accessibility of a region has been identified as major predictor of sampling effort, but this effect might vary among datasets. The SampBias package allows quantifying the effect of different anthropogenic structures (roads, airports, cities) on sampling in any given dataset based on geographic gazetteers. You can find a description of the methods here and tutorials on the use of SampBias here. NOTE SampBias is only available as beta version

Objectives

After this exercise you will be able to * quantify the effect of accessibility on the sampling pattern in species occurrence dataset * have an idea on political factors biasing data collection

Exercise

Helpful functions for answering each question are given in the brackets. In case you want to get a feeling for the functionality of SampBias, without using are, you can find a GUI app here.

Load the distribution data. (read.csv)
Run sampbias with the default settings. (SamplingBias)
Look at the run summary and visualize the results. How informative are the results? (par, plot)
Explore ?SamplingBias and try to change the relevant arguments to improve the results. Summarize and visualize again.
Explore the relation of socio-economic factors in your group of interest. (https://bio-dem.surge.sh/)

Possible questions for your research project

How biased is your collection dataset by accesability?
Which inffrastructure is most biasing?

Library setup

You will need the following R libraries for this exercise, just copy the code chunk into you R console to load them. You might need to install some of them separately.

library(tidyverse)
library(sampbias)

Tutorial

1. Load the example distribution data from your data

occ <- read_csv("inst/occurrence_records_clean.csv") %>% mutate(decimallongitude = decimalLongitude) %>% 
    mutate(decimallatitude = decimalLatitude)

2. Run sampbias with the default settings. (SamplingBias)

bias.out <- SamplingBias(x = occ, res = 1)

3. Look at the run summary and visualize the results. How informative are the results?

# summarize results
summary(bias.out)

# Visualize
plot(bias.out)

4. Explore `?SamplingBias` and try to change the relevant arguments to improve the results. Summarize and visualize again.

bias.det <- SamplingBias(x = occ, res = 0.1)

# summarize results
summary(bias.det)

# Visualize
par(mfrow = c(3, 2))
plot(bias.det)

Quantifying sampling bias