The GeoSSE model jointly infers area-specific speciation rates, dispersal rates and local extinction rates (Goldberg et al. 2011). The mcmc-SSE
program implements a Bayesian algorithm to infer these rates.
The analysis requires a tree file, a table with the geographic ranges (species with no geographic data or not present in the tree will be automatically removed).
library("diversitree")
library("picante")
You will find the mcmc-SSE-lib file in the “example_file” dropbox folder.
source("my_path/mcmc-SSE-lib.R")
Since the GoeSSE model only allows for 2 areas the input data might need to be modified by merging some of the areas.
Define which areas should be merged
data_SPGC <- read.csv("example_data/bombacoideae_biome_classification_details.csv",
row.names = 1)
columns_areaA = c(1, 3)
columns_areaB = c(2, 4, 5)
geosse_areas <- comb_areas(data_SPGC, columns_areaA, columns_areaB)
write.table(geosse_areas, file = "example_data/traitGeoSSE_owndataconverted.txt",
sep = "\t", row.names = F, quote = F)
Set working directory:
setwd("my_path/example_files/")
Define tree file (NEXUS format, can contain multiple trees)
tree_file = "example_data/bromelioieae_consensus.tre"
Define geographic range file (tab-separated text file). Geographic ranges are defined as 0 (present in area A), 1 (present in area B), 2 (present in area AB)
range_file = "example_data/traitGeoSSE.txt"
State-specific taxon sampling (A, B, AB)
sampling_fractions = c(0.7, 0.8, 1)
Define output file name
mcmc.log = "bromeliad_geosse.log"
Run the analysis
run_mcmc_SSE(tree_file, range_file, model = "geosse", outfile = mcmc.log, iterations = 10000,
rho = sampling_fractions)
Read output file
post = read.table(mcmc.log, header = T)
Plot trace of sampled likelihoods
plot(post$likelihood, type = "l")
Plot speciation, dispersal, and extinction rates per area:
par(mfrow = c(1, 3))
boxplot(post[, c("sA", "sB", "sAB")], col = "blue", main = "speciation rates")
boxplot(post[, c("dA", "dB")], col = "gray", main = "dispersal rates")
boxplot(post[, c("xA", "xB")], col = "red", main = "extinction rates")
Are dispersals rates significantly different?
delta_dispersal = post$dA - post$dB
hist(delta_dispersal)
Calculate posterior probability of dAB > dBA:
length(delta_dispersal[delta_dispersal > 0])/length(delta_dispersal)
Are extinction rates significantly different?
delta_extinction = post$xA - post$xB
hist(delta_extinction)
Calculate the posterior probability of xA > xB:
length(delta_extinction[delta_extinction > 0])/length(delta_extinction)
If provided with a tree file that includes multiple trees the MCMC can run sequentially on different trees to account for phylogenetic uncertainty. The burnin
defines how many initial MCMC iterations should be discarded (i.e. not saved in the log file) for each tree.
tree_file = "bromelioieae_100.trees"
run_mcmc_SSE(tree_file, trait_file, model = "geosse", outfile = "bromeliad_geosse100.log",
iterations = 10000, nTREES = 5, burnin = 100)