Joint Bayesian estimation of speciation, dispersal and extinction rates: the GeoSSE model

The GeoSSE model jointly infers area-specific speciation rates, dispersal rates and local extinction rates (Goldberg et al. 2011). The mcmc-SSE program implements a Bayesian algorithm to infer these rates.

The analysis requires a tree file, a table with the geographic ranges (species with no geographic data or not present in the tree will be automatically removed).

Load dependent libraries:

library("diversitree")
library("picante")

Load mcmc-SSE library

You will find the mcmc-SSE-lib file in the “example_file” dropbox folder.

source("my_path/mcmc-SSE-lib.R")

Prep data

Since the GoeSSE model only allows for 2 areas the input data might need to be modified by merging some of the areas.

Define which areas should be merged

data_SPGC <- read.csv("example_data/bombacoideae_biome_classification_details.csv", 
    row.names = 1)
columns_areaA = c(1, 3)
columns_areaB = c(2, 4, 5)

geosse_areas <- comb_areas(data_SPGC, columns_areaA, columns_areaB)
write.table(geosse_areas, file = "example_data/traitGeoSSE_owndataconverted.txt", 
    sep = "\t", row.names = F, quote = F)

Setup an analysis

Set working directory:

setwd("my_path/example_files/")

Define tree file (NEXUS format, can contain multiple trees)

tree_file = "example_data/bromelioieae_consensus.tre"

Define geographic range file (tab-separated text file). Geographic ranges are defined as 0 (present in area A), 1 (present in area B), 2 (present in area AB)

range_file = "example_data/traitGeoSSE.txt"

State-specific taxon sampling (A, B, AB)

sampling_fractions = c(0.7, 0.8, 1)

Define output file name

mcmc.log = "bromeliad_geosse.log"

Run the analysis

run_mcmc_SSE(tree_file, range_file, model = "geosse", outfile = mcmc.log, iterations = 10000, 
    rho = sampling_fractions)

Process and plot output

Read output file

post = read.table(mcmc.log, header = T)

Plot trace of sampled likelihoods

plot(post$likelihood, type = "l")

Plot speciation, dispersal, and extinction rates per area:

par(mfrow = c(1, 3))
boxplot(post[, c("sA", "sB", "sAB")], col = "blue", main = "speciation rates")
boxplot(post[, c("dA", "dB")], col = "gray", main = "dispersal rates")
boxplot(post[, c("xA", "xB")], col = "red", main = "extinction rates")

Are dispersals rates significantly different?

delta_dispersal = post$dA - post$dB
hist(delta_dispersal)

Calculate posterior probability of dAB > dBA:

length(delta_dispersal[delta_dispersal > 0])/length(delta_dispersal)

Are extinction rates significantly different?

delta_extinction = post$xA - post$xB
hist(delta_extinction)

Calculate the posterior probability of xA > xB:

length(delta_extinction[delta_extinction > 0])/length(delta_extinction)

Run on multiple trees

If provided with a tree file that includes multiple trees the MCMC can run sequentially on different trees to account for phylogenetic uncertainty. The burnin defines how many initial MCMC iterations should be discarded (i.e. not saved in the log file) for each tree.

tree_file = "bromelioieae_100.trees"
run_mcmc_SSE(tree_file, trait_file, model = "geosse", outfile = "bromeliad_geosse100.log", 
    iterations = 10000, nTREES = 5, burnin = 100)