Quantcast
Channel: Phylogenetic Tools for Comparative Biology
Viewing all articles
Browse latest Browse all 802

A bounded continuous character evolution model with absorbing boundaries for phytools

$
0
0

Followers of this blog should know by now that I’m fascinated by applications of the discretized diffusion approximation described in Boucher & Démery (2016), for instance to fit a multi-state threshold model, a semi-threshold model, or a state-dependent or hidden-state model of multi-rate continuous character evolution.

Really one of the simplest extensions of Boucher & Démery’s bounded model, however, might be a bounded model with absorbing rather than reflective bounds.

This is just an evolutionary process in which whenever stochastic diffusion reaches one of the boundary values of the trait, it just sticks there.

As far as I can tell, the only way to simulate data with this property is forward-in-time, but this is the best way to get a good sense of what I’m talking about, so here goes.

First, here’s my simulating function:

sim_absorbing<-function(tree,a=0,sig2=1,
  bounds=c(-Inf,Inf),nsteps=1000,
  ...){
  if(hasArg(plot)) plot<-list(...)$plot
  else plot=FALSE
  if(hasArg(reflective)) reflective<-list(...)$reflective
  else reflective<-FALSE
  tree<-reorder(tree,"cladewise")
  ll<-max(nodeHeights(tree))
  tt<-map.to.singleton(make.era.map(tree,
    limits=seq(0,ll,length.out=nsteps+1)))
  delta_x<-rnorm(n=nrow(tt$edge),sd=sqrt(sig2*tt$edge.length))
  X<-matrix(NA,nrow(tt$edge),2)
  ROOT<-Ntip(tt)+1
  X[which(tt$edge[,1]==ROOT),1]<-a
  for(i in 1:nrow(tt$edge)){
    X[i,2]<-X[i,1]+delta_x[i]
    if(!reflective){
      if(X[i,2]<=bounds[1]) X[i,2]<--Inf
      else if(X[i,2]>=bounds[2]) X[i,2]<-Inf
    } else {
      while(X[i,2]<=bounds[1]||X[i,2]>=bounds[2]){
        if(X[i,2]<=bounds[1]) X[i,2]<-2*bounds[1]-X[i,2]
        if(X[i,2]>=bounds[2]) X[i,2]<-2*bounds[2]-X[i,2]
      }
    }
    jj<-which(tt$edge[,1]==tt$edge[i,2])
    if(length(jj)>0) X[jj,1]<-X[i,2]
  }
  ii<-which(X==-Inf)
  if(length(ii)>0)  X[ii]<-bounds[1]
  ii<-which(X==Inf)
  if(length(ii)>0) X[ii]<-bounds[2]
  if(plot){
    LIMS<-vector()
    LIMS[1]<-if(bounds[1]==-Inf) min(X) else bounds[1]
    LIMS[2]<-if(bounds[2]==Inf) max(X) else bounds[2]
    hh<-hcl.colors(n=201)
    edge_col<-hh[floor(200*((X[,1]-LIMS[1])/diff(LIMS)))+1]
    par(mfrow=c(2,1))
    par(mar=c(0.1,4.1,2.1,1.1))
    plot(tt,edge.color=edge_col,show.tip.label=FALSE,
      edge.width=3)
    par(mar=c(5.1,4.1,0.1,1.1))
    plot(NA,xlim=c(0,ll),ylim=range(X),
      xlab="time",ylab="phenotype",bty="n",
      las=1)
    H<-nodeHeights(tt)
    for(i in 1:nrow(tt$edge))
      lines(H[i,],X[i,],col=edge_col[i],lwd=3)
  }
  setNames(sapply(X=1:Ntip(tt),
    FUN=function(i,x,e) X[which(e[,2]==i),2],
    x=X,e=tt$edge),tt$tip.label)
}

Now let’s load phytools and run it:

library(phytools)
packageVersion("phytools")
## [1] '2.4.13'
tree<-pbtree(n=40,scale=1)
x<-sim_absorbing(tree,plot=TRUE,bounds=c(-1,1),
  nsteps=400)
bcols<-hcl.colors(2)
lines(c(0,max(nodeHeights(tree))),rep(-1,2),lwd=8,
  col=make.transparent(bcols[1],0.2))
text(0.1,-1,"lower bound",pos=3)
lines(c(0,max(nodeHeights(tree))),rep(1,2),lwd=8,
  col=make.transparent(bcols[2],0.2))
text(0.1,1,"upper bound",pos=1)

plot of chunk unnamed-chunk-17

The update to phytools::bounded_bm that allows us to fit this model is already in phytools on GitHub. I won’t explain all the details, because if you understand the discrete diffusion approximation – you’ll understand how this works. (If you don’t understand it, I recommend checking out my blog over the past 16 months, starting with this post.)

fit_absorbing<-bounded_bm(tree,x,absorbing=TRUE,
  lims=c(-1,1))
fit_absorbing
## Object of class "bounded_bm" based on
##  	a discretization with k = 200 levels.
## 
## Absorbing bounded model
## 
## Set or estimated bounds: [ -1 , 1 ]
## 
## Fitted model parameters:
##   sigsq: 1.087513 
##      x0: 0.195 
## 
## Log-likelihood: 61.975761 
## 
## R thinks it has found the ML solution.

(This takes longer than the bounded model due to some technical reasons regarding matrix exponentiation to compute the likelihood which I won’t get into.)

We can compare this to the bounded model and to a standard unbounded Brownian model as follows:

fit_bounded<-bounded_bm(tree,x,lims=c(-1,1))
fit_bounded
## Object of class "bounded_bm" based on
##  	a discretization with k = 200 levels.
## 
## Unwrapped (i.e., bounded) model
## 
## Set or estimated bounds: [ -1 , 1 ]
## 
## Fitted model parameters:
##   sigsq: 0.889727 
##      x0: 0.675 
## 
## Log-likelihood: -10.098431 
## 
## R thinks it has found the ML solution.
fit_bm<-bounded_bm(tree,x,levs=400)
fit_bm
## Object of class "bounded_bm" based on
##  	a discretization with k = 400 levels.
## 
## Unwrapped (i.e., bounded) model
## 
## Set or estimated bounds: [ -2 , 2 ]
## 
## Fitted model parameters:
##   sigsq: 0.68214 
##      x0: 0.335 
## 
## Log-likelihood: -26.840089 
## 
## R thinks it has found the ML solution.

(I increased levs for the standard BM model so that I would have the same bin density in both discretizations. This isn’t technically necessary, but might make model comparison more precise.)

AIC(fit_bm,fit_bounded,fit_absorbing)
##               df        AIC
## fit_bm         2   57.68018
## fit_bounded    4   28.19686
## fit_absorbing  4 -115.95152

This shows that we have way more support for the generating model than for the other two. If we take a look at our fitted models, we’ll also learn that our parameter estimates are much closer to the generating conditions when the absorbing model is fit.

Our standard BM model should be equivalent (in the limit as levs is increased) in both parameter estimates and log-likelihood to what you would obtain using geiger::fitContinuous with the default model. Let’s see.

geiger::fitContinuous(tree,x)
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
## 	sigsq = 0.683954
## 	z0 = 0.339721
## 
##  model summary:
## 	log-likelihood = -26.937688
## 	AIC = 57.875377
## 	AICc = 58.199701
## 	free parameters = 2
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 100
## 	frequency of best fit = 1.000
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

Just as a gut-check, let’s repeat the same exercise, even using our forward-in-time simulator, but where our data were simulated under bounded Brownian motion with reflective boundary conditions – rather than with absorbing bounds. (I’m going to use a the same random number seed as I did before so that the first part of my simulation, until the boundary is reached, will be the same.)

tree<-pbtree(n=40,scale=1)
x<-sim_absorbing(tree,plot=TRUE,bounds=c(-1,1),
  nsteps=400,reflective=TRUE)
bcols<-hcl.colors(2)
lines(c(0,max(nodeHeights(tree))),rep(-1,2),lwd=8,
  col=make.transparent(bcols[1],0.2))
text(0.1,-1,"lower bound",pos=3)
lines(c(0,max(nodeHeights(tree))),rep(1,2),lwd=8,
  col=make.transparent(bcols[2],0.2))
text(0.1,1,"upper bound",pos=1)

plot of chunk unnamed-chunk-24

fit_bm.2<-bounded_bm(tree,x,levs=400)
fit_bm.2
## Object of class "bounded_bm" based on
##  	a discretization with k = 400 levels.
## 
## Unwrapped (i.e., bounded) model
## 
## Set or estimated bounds: [ -1.854628 , 1.942412 ]
## 
## Fitted model parameters:
##   sigsq: 0.59925 
##      x0: 0.266968 
## 
## Log-likelihood: -24.279488 
## 
## R thinks it has found the ML solution.
fit_bounded.2<-bounded_bm(tree,x,lims=c(-1,1))
fit_bounded.2
## Object of class "bounded_bm" based on
##  	a discretization with k = 200 levels.
## 
## Unwrapped (i.e., bounded) model
## 
## Set or estimated bounds: [ -1 , 1 ]
## 
## Fitted model parameters:
##   sigsq: 1.131566 
##      x0: 0.995 
## 
## Log-likelihood: -16.018271 
## 
## R thinks it has found the ML solution.
fit_absorbing.2<-bounded_bm(tree,x,lims=c(-1,1),
  absorbing=TRUE)
fit_absorbing.2
## Object of class "bounded_bm" based on
##  	a discretization with k = 200 levels.
## 
## Absorbing bounded model
## 
## Set or estimated bounds: [ -1 , 1 ]
## 
## Fitted model parameters:
##   sigsq: 0.440578 
##      x0: 0.165 
## 
## Log-likelihood: -30.985182 
## 
## R thinks it has found the ML solution.
AIC(fit_bm.2,fit_bounded.2,fit_absorbing.2)
##                 df      AIC
## fit_bm.2         2 52.55898
## fit_bounded.2    4 40.03654
## fit_absorbing.2  4 69.97036

Makes sense.


Viewing all articles
Browse latest Browse all 802

Trending Articles