Reversing the time axis in an ltt95 plot

October 25, 2014, 8:02 pm

≫ Next: Optimizing tree transformations for multiple discrete characters

≪ Previous: Colors terminal edges in plotTree.wBars by a discrete trait

A phytools user recently asked the following:

“I am currently using the ltt95 function on a "multiPhylo" object. I would like to known if there is a solution to reverse the time axis from the root of the tree to present?”

Indeed this is not too difficult. ltt95already returns an object of class "ltt95" invisibly. This object is basically a matrix containing the plotted data for x and y in columns, with a few different attributes to tell the S3 plotting method for objects of its class how to handle the matrix. So if we were naive as to the functionality of our plotting method, we could just manipulate the contents of that matrix to get the plot that we wanted. phytools makes things even easier, however, as there is an argument xaxis in the plotting method for objects of class "ltt95" that automates this all for us.

Here's how we can use it to control the direction of x in various ways:

## load phytools
library(phytools)

## Loading required package: ape
## Loading required package: maps

## simulate some trees
trees<-pbtree(n=100,scale=50,nsim=100)
## now create a standard ltt95 plot
obj<-ltt95(trees,log=TRUE)

plot of chunk unnamed-chunk-1

## OK - now if we want to simply flip the values on x
## so that they should "time before present"
plot(obj,xaxis="flipped")

plot of chunk unnamed-chunk-1

## we can also plot time before present as "negative time"
plot(obj,xaxis="negative")

plot of chunk unnamed-chunk-1

Well, that's really it.

↧

Optimizing tree transformations for multiple discrete characters

October 27, 2014, 10:20 am

≫ Next: Alternative implementations of fitContinuous

≪ Previous: Reversing the time axis in an ltt95 plot

A recent R-sig-phylo subscriber asked:

“I have a tree and discrete data (number of gene copies, for many genes) and would like to use the fitDiscrete function in geiger, or something similar. However, I would like to estimate the parameters given all of the datasets, not just with the data for each gene. For instance, if I was using the "delta” model to vary rates across the tree, I would like this delta value to reflect some sort of summary value across all datasets. Does anyone have an idea as to how this could be accomplished or perhaps point me in the right direction?“

Brian O'Meara pointed out that it would be straighforward to write a likelihood function that just summed the log-likelihoods across characters for different tree transformations and then wrapped the function in an optimizer. The following illustrates this explicitly.

First load packages - for this we need ape, phytools (just for simulation here), and geiger (for our tree rescaling):

## load packages
library(ape)
library(phytools)

## Loading required package: maps

library(geiger)

Now let's simulate some data that would be analogous to what we'd load from file in the empirical case:

## simulate tree & data
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
tree<-pbtree(n=100,scale=1)
X<-replicate(10,sim.history(rescale(tree,model="lambda",lambda=0.5),Q)$states)

## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).

## this is what our data look like:
X

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## t64  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  
## t97  "a"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  
## t98  "b"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "a"  
## t43  "b"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  
## t44  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  
## t23  "b"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  "b"  "a"  
## t85  "b"  "a"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "b"  
## t86  "b"  "a"  "b"  "b"  "b"  "b"  "a"  "b"  "a"  "a"  
## t68  "a"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  
## t83  "b"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  
## t84  "a"  "a"  "b"  "a"  "b"  "a"  "b"  "a"  "b"  "b"  
## t2   "a"  "a"  "a"  "a"  "b"  "b"  "a"  "b"  "b"  "a"  
## t3   "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "a"  "a"  
## t91  "b"  "b"  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  
## t92  "a"  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  
## t45  "b"  "a"  "b"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  
## t46  "b"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "a"  
## t58  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  
## t59  "a"  "a"  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  
## t21  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  "b"  
## t60  "b"  "a"  "a"  "a"  "b"  "b"  "b"  "a"  "b"  "b"  
## t89  "a"  "a"  "b"  "a"  "b"  "b"  "a"  "a"  "a"  "b"  
## t90  "b"  "b"  "a"  "b"  "a"  "a"  "a"  "a"  "b"  "b"  
## t36  "b"  "a"  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "a"  
## t27  "b"  "b"  "b"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  
## t28  "a"  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "b"  "b"  
## t29  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "a"  "b"  "b"  
## t37  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  "b"  
## t38  "b"  "a"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  
## t16  "a"  "a"  "b"  "b"  "a"  "a"  "a"  "b"  "a"  "a"  
## t17  "a"  "a"  "a"  "b"  "a"  "b"  "b"  "a"  "a"  "b"  
## t15  "b"  "a"  "a"  "a"  "a"  "b"  "b"  "a"  "b"  "b"  
## t47  "b"  "a"  "b"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  
## t61  "b"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  
## t62  "b"  "a"  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "b"  
## t10  "b"  "a"  "b"  "b"  "b"  "b"  "a"  "b"  "a"  "b"  
## t11  "b"  "a"  "a"  "b"  "b"  "a"  "b"  "b"  "a"  "b"  
## t4   "b"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  
## t13  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "b"  
## t31  "b"  "a"  "a"  "b"  "a"  "b"  "b"  "a"  "a"  "a"  
## t93  "b"  "a"  "a"  "b"  "b"  "b"  "b"  "a"  "b"  "b"  
## t94  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "a"  
## t8   "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  
## t5   "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "a"  "b"  
## t42  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "a"  "a"  "a"  
## t76  "a"  "b"  "b"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  
## t77  "a"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  "a"  "b"  
## t63  "a"  "a"  "b"  "b"  "b"  "b"  "a"  "a"  "b"  "a"  
## t95  "b"  "b"  "b"  "b"  "a"  "b"  "a"  "a"  "a"  "a"  
## t96  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "a"  
## t32  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  "b"  "a"  
## t33  "b"  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "a"  
## t81  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  
## t82  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  
## t24  "b"  "b"  "a"  "a"  "b"  "b"  "a"  "b"  "a"  "b"  
## t14  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  
## t25  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  
## t78  "a"  "a"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "b"  
## t79  "a"  "b"  "b"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  
## t19  "b"  "b"  "a"  "b"  "b"  "a"  "a"  "b"  "a"  "a"  
## t50  "a"  "b"  "a"  "a"  "b"  "b"  "a"  "a"  "a"  "b"  
## t51  "b"  "a"  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  
## t26  "a"  "b"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "a"  
## t66  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "a"  "a"  
## t67  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "b"  
## t54  "a"  "b"  "b"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  
## t55  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "b"  
## t52  "a"  "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  
## t53  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  "b"  
## t7   "b"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  "b"  "b"  
## t18  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  
## t20  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  "a"  "a"  
## t69  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  
## t87  "b"  "b"  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  
## t88  "b"  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  "b"  
## t65  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  "a"  "a"  
## t74  "b"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  "a"  
## t75  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "a"  "b"  
## t39  "b"  "a"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  
## t22  "a"  "a"  "a"  "a"  "a"  "b"  "b"  "a"  "a"  "b"  
## t34  "b"  "b"  "b"  "a"  "b"  "b"  "a"  "b"  "a"  "a"  
## t35  "a"  "a"  "a"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  
## t70  "b"  "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "a"  
## t71  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  "a"  "b"  
## t99  "a"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  "b"  
## t100 "b"  "a"  "a"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  
## t80  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  
## t6   "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  
## t12  "b"  "a"  "a"  "a"  "a"  "b"  "a"  "a"  "b"  "b"  
## t30  "b"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "b"  "b"  
## t56  "b"  "b"  "a"  "a"  "b"  "b"  "a"  "a"  "b"  "b"  
## t57  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "b"  "b"  "a"  
## t40  "a"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "b"  
## t41  "b"  "a"  "b"  "a"  "b"  "b"  "a"  "b"  "b"  "a"  
## t1   "a"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  "b"  
## t9   "a"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  "b"  "b"  
## t48  "b"  "a"  "a"  "a"  "b"  "b"  "a"  "b"  "b"  "b"  
## t49  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  
## t72  "b"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  "b"  
## t73  "a"  "a"  "a"  "b"  "b"  "b"  "a"  "b"  "a"  "b"

Note that it is arbitrary, and unnecessary, that I have simulated using the same transition matrix Q for each data column here.

Finally, our tree-transformation fitting function. Here, I've used acefrom ape internally to compute the likelihood of each character for each tree transformation. One could instead use fitDiscrete (which is slower, but perhaps more robust) or function in the phangorn package.

## here is our model fitting function
fitTransform<-function(tree,X,model,interval){
    lk<-function(par,tree,X,model){
        sum(apply(X,2,function(x,tree,model,par) 
            logLik(ace(x,rescale(tree,model,par),type="discrete")),
            tree=tree,model=model,par=par))
    }
    fit<-optimize(lk,interval,tree=tree,X=X,model=model,maximum=TRUE)
    return(list(par=fit$maximum,model=model,logLik=fit$objective))
}
obj<-fitTransform(tree,X,model="lambda",interval=c(0,1))
## here is our fitted model
obj

## $par
## [1] 0.472
## 
## $model
## [1] "lambda"
## 
## $logLik
## [1] -666.7

We could use the same function with different values of model and (necessarily, if so) interval. I use model="lambda" merely to illustrate the case.

That's it.

↧

Alternative implementations of fitContinuous

October 28, 2014, 2:20 pm

≫ Next: Pruning trees to one member per genus, and to one descendant for each clade younger than a particular age

≪ Previous: Optimizing tree transformations for multiple discrete characters

A colleague contacted me today about “alternative implementations” of geiger's fitContinuous function that he could use to check parameter estimation, and maybe to speed calculations as fitContinuouscan be somewhat slow (although is evidently quite robust now) for some models.

Well, I don't know about speeding things up - but it is pretty straightforward to write a wrapper function that combines phytools brownie.lite single-rate model with geiger rescale.phylo to fit many of the models of fitContinuous. The following is a simple demo:

## load packages
library(phytools)
library(geiger)

## helper function to get the root node number
getRoot<-function(tree) Ntip(tree)+1

## here is our fitContinuous lite function
fitCont<-function(tree,x,model="BM",interval=NULL){
    if(model!="BM"){
        lk<-function(par,tree,x,model) brownie.lite(paintSubTree(rescale(tree,model,par),getRoot(tree),"1"),x)$logL1
        oFit<-optimize(lk,interval,tree=tree,x=x,model=model,maximum=TRUE)
        cFit<-brownie.lite(paintSubTree(rescale(tree,model,oFit$maximum),getRoot(tree),"1"),x)
        obj<-list(par=oFit$maximum,model=model,sig2=cFit$sig2.single,a=cFit$a.single,logLik=oFit$objective)
    } else {
        fit<-brownie.lite(paintSubTree(tree,getRoot(tree),"1"),x)
        obj<-list(model=model,sig2=fit$sig2.single,a=fit$a.single,logLik=fit$logL1)
    }
    obj
}

OK, now let's try it for simulated data:

## simulate tree
tree<-pbtree(n=100,scale=1)
## simulate data under model="BM"
x<-fastBM(tree)
fitContinuous(tree,x)

## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
##  sigsq = 0.940924
##  z0 = -0.186971
## 
##  model summary:
##  log-likelihood = -58.714396
##  AIC = 121.428793
##  AICc = 121.552504
##  free parameters = 2
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

fitCont(tree,x)

## $model
## [1] "BM"
## 
## $sig2
## [1] 0.9409
## 
## $a
## [1] -0.187
## 
## $logLik
## [1] -58.71

## simulate data under model="lambda"
x<-fastBM(rescale(tree,model="lambda",0.5))
fitContinuous(tree,x,model="lambda")

## GEIGER-fitted comparative model of continuous data
##  fitted 'lambda' model parameters:
##  lambda = 0.729756
##  sigsq = 1.217453
##  z0 = 0.254493
## 
##  model summary:
##  log-likelihood = -122.552971
##  AIC = 251.105941
##  AICc = 251.355941
##  free parameters = 3
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.32
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

fitCont(tree,x,model="lambda",interval=c(0,1))

## $par
## [1] 0.7298
## 
## $model
## [1] "lambda"
## 
## $sig2
## [1] 1.217
## 
## $a
## [1] 0.2545
## 
## $logLik
## [1] -122.6

## simulate data under model="OU"
x<-fastBM(rescale(tree,model="OU",0.4))
fitContinuous(tree,x,model="OU")

## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
##  alpha = 0.536268
##  sigsq = 1.171551
##  z0 = 0.216378
## 
##  model summary:
##  log-likelihood = -61.980067
##  AIC = 129.960133
##  AICc = 130.210133
##  free parameters = 3
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.60
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

fitCont(tree,x,model="OU",interval=c(0,1))

## $par
## [1] 0.5363
## 
## $model
## [1] "OU"
## 
## $sig2
## [1] 1.172
## 
## $a
## [1] 0.2164
## 
## $logLik
## [1] -61.98

Running system.time with any of these examples (except for model="BM") does not show any improvement of this ostensibly lighter implementation - but this is probably due to lots of unnecessary internals in brownie.lite for fitting a multi-rate model (which is not actually done here). Improving on this is thus pretty simple (but I will not do so here, nor now).

That's it.

↧

Pruning trees to one member per genus, and to one descendant for each clade younger than a particular age

November 5, 2014, 2:06 pm

≫ Next: Bug fix in plot.contMap / plot.densityMap

≪ Previous: Alternative implementations of fitContinuous

The following requestwas recently posted to the R-sig-phylo email list:

“(1) I'd like to drop all but one of the tips in each genus (i.e. to generate a genus-level phylogeny). Given that all genera are monophyletic, it shouldn't matter which species I pick.

(2) I'd like to establish an era (e.g. 25 Mya) and leave only one representative of the clades there were present at that point in time.”

I submitted my replies (1, 2) but I'll repeat them here anyway.

First, let's simulate scenario (1) in which we have a species-level tree with one or more species per genus, and want to reduce the tree to a genus tree:

library(phytools)
## here are our species names
tips<-c("Abc_def","Abc_ghi","Def_ghi","Def_jkl",
"Ghi_jkl","Ghi_mno","Jkl_mno","Jkl_pqr")
tree<-stree(n=8,type="balanced",tip.label=tips)
plotTree(tree,ftype="i")

plot of chunk unnamed-chunk-1

Next, we'll get a list of all genus names, and prune the tree of all but one member of each genus:

## get a list of all genera
tips<-tree$tip.label
genera<-unique(sapply(strsplit(tips,"_"),function(x) x[1]))
## here are our genera
genera

## [1] "Abc" "Def" "Ghi" "Jkl"

## now drop all but one of each
ii<-sapply(genera,function(x,y) grep(x,y)[1],y=tips)
tree<-drop.tip(tree,setdiff(tree$tip.label,tips[ii]))
plotTree(tree,ftype="i")

plot of chunk unnamed-chunk-2

Finally, we can rename our tips to be the genus names only:

tree$tip.label<-sapply(strsplit(tree$tip.label,"_"),function(x) x[1])
plotTree(tree,ftype="i")

plot of chunk unnamed-chunk-3

That's it.

Now, let's move on to part (2). For this, I'll simulate a true tree of arbitrary total depth 100.

tree<-pbtree(n=100,scale=100)
plotTree(tree,fsize=0.6)

plot of chunk unnamed-chunk-4

Now, let's prune each clade younger than 25 units of time old to include only 1 descendant:

## get all node heights
H<-nodeHeights(tree)
## time from the root
t<-max(H)-25
## identify all edges that cross 25 mybp
h1<-which(H[,1]<t)
h2<-which(H[,2]>t)
ii<-intersect(h1,h2)
## all daughter nodes of those edges
nodes<-tree$edge[ii,2]
## internal phytools function
getDescendants<-phytools:::getDescendants
## find all descendants from each edge
tips<-lapply(nodes,getDescendants,tree=tree)
## find all tips to keep
tips<-tree$tip.label[sapply(tips,function(x,y) x[x<=Ntip(y)][1],y=tree)]
## drop all the others
tree<-drop.tip(tree,setdiff(tree$tip.label,tips))
plotTree(tree)
lines(c(t,t),par()$usr[3:4],lty="dashed",col="red")

plot of chunk unnamed-chunk-5

Cool. It worked!

That's all for now.

↧

Bug fix in plot.contMap / plot.densityMap

November 27, 2014, 5:25 am

≫ Next: Testing a hypothesis that phylogenetic signal is different from 1.0

≪ Previous: Pruning trees to one member per genus, and to one descendant for each clade younger than a particular age

A few months ago Eliot Miller reported a strange bug in contMap in which sometimes weird, white (and thus nearly invisible on a white background) text was appearing on certain parts of the plot.

For example:

library(phytools)

## Loading required package: ape
## Loading required package: maps

packageVersion("phytools")

## [1] '0.4.36'

set.seed(1)
tree<-pbtree(n=26,scale=1)
## realistic tip labels
tree$tip.label<-paste(LETTERS[26:1],"._",sep="",replicate(26,paste(sample(letters,sample(3:12,1)),collapse="")))
x<-fastBM(tree)
obj<-contMap(tree,x,outline=TRUE)

plot of chunk unnamed-chunk-1

The white text is what appears to cause the lines close to the base of the tree to be interrupted.

Well, this bug isn't quite as mysterious as it might seem at first blush. This is because to plot the outline of the phylogeny and then the contMap I use two calls to the phytools function plotSimmapinternally; however to avoid plotting the labels twice I first set par(col="white"), plot the outline tree, and then set par(col="black") and plot the "contMap".

Why the text is showing up in the wrong place, on the other hand, beats me - although I'll bet it has something to do with how I am computing the label offset. Oh well. A quick fix that I have applied in the latest non-CRAN phytools version is to use par(col="transparent") instead of par(col="white"). Hopefully all plotting devices support transparency.

Here's how it works:

detach("package:phytools",unload=TRUE)
install.packages("phytools_0.4-38.tar.gz",type="source")

## Installing package into 'C:/Users/Liam/Documents/R/win-library/3.1'
## (as 'lib' is unspecified)
## inferring 'repos = NULL' from 'pkgs'

library(phytools)
packageVersion("phytools")

## [1] '0.4.38'

plot(obj)

plot of chunk unnamed-chunk-2

OK, problem solved. I'm a little concerned that transparency will not be supported by all plotting devices, but hopefully this is a temporary fix & I can figure out the real problem.

That's it. Happy Thanksgiving.

↧

Testing a hypothesis that phylogenetic signal is different from 1.0

December 2, 2014, 2:35 pm

≫ Next: R function for Pagel's 1994 correlation method

≪ Previous: Bug fix in plot.contMap / plot.densityMap

A phytools user asked today:

“A reviewer asked us to test if phylogenetic signal is equal to 1, but I have not found any package that does that. Do you have any suggestion about how to do this?

Since phylogenetic signal (measured using either Pagel's λ or Blomberg's K) has an expected value of 1.0 under Brownian motion; testing whether signal is different from 1.0 is essentially testing whether phylogenetic signal is smaller or greater than if the data arose by a Brownian process.

We can do this with a likelihood-ratio test (with some caveats) if we are using Pagel's λ to measure phylogenetic signal. If we are using Blomberg's K then we can use simulation (with fewer caveats).

Here's a quick demo.

First, let's simulate some data with & without signal. For the no-signal data I will just randomly permute the Brownian data:

library(phytools)
## tree
tree<-pbtree(n=26,tip.label=LETTERS)
## data with signal
x1<-fastBM(tree)
## data without signal
x2<-setNames(sample(x1),names(x1))

Now let's estimate Pagel's lambda for both datasets. In each case, we will compare the likelihood of the fitted model to the likelihood of λ = 1.

## Pagel's lambda
## first the data with signal
lam1<-phylosig(tree,x1,method="lambda")
lam1

## $lambda
## [1] 0.9413
## 
## $logL
## [1] -38.39

fitBrownian<-brownie.lite(paintSubTree(tree,tree$edge[1,1],state="1"),x1)
fitBrownian

## ML single-rate model:
##  s^2 se  a   k   logL
## value    1.1035  0.3061  -0.2291 2   -38.6817    
## 
## ML multi-rate model:
##  s^2(1)  se(1)   a   k   logL    
## value    1.1035      -0.2291 2   -38.6817
## 
## P-value (based on X^2): 1 
## 
## R thinks it has found the ML solution.

## likelihood ratio test
LR<-2*(lam1$logL-fitBrownian$logL1)
LR

## [1] 0.5774

P.lr<-pchisq(LR,df=1,lower.tail=FALSE)
P.lr

## [1] 0.4473

## now the data without signal
lam2<-phylosig(tree,x2,method="lambda")
lam2

## $lambda
## [1] 6.917e-05
## 
## $logL
## [1] -46.33

fitBrownian<-brownie.lite(paintSubTree(tree,tree$edge[1,1],state="1"),x2)
fitBrownian

## ML single-rate model:
##  s^2 se  a   k   logL
## value    4.5548  1.2633  0.0452  2   -57.112 
## 
## ML multi-rate model:
##  s^2(1)  se(1)   a   k   logL    
## value    4.5548  1   0.0452  2   -57.112
## 
## P-value (based on X^2): 1 
## 
## Optimization may not have converged.

## likelihood ratio test
LR<-2*(lam2$logL-fitBrownian$logL1)
LR

## [1] 21.56

P.lr<-pchisq(LR,df=1,lower.tail=FALSE)
P.lr

## [1] 3.428e-06

Now let's do roughly the same thing for Blomberg's K. In this case, however, we will use simulation by Brownian motion to get a null distribution for K.

## Blomberg's K
## first the data with signal
K1<-phylosig(tree,x1)
K1

## [1] 0.9221

## fit a Brownian model to the data to get parameters
## for simulation
fitBrownian<-brownie.lite(paintSubTree(tree,tree$edge[1,1],state="1"),x1)
X<-fastBM(tree,sig2=fitBrownian$sig2.single,a=fitBrownian$a.single,nsim=999)
nullK<-apply(X,2,phylosig,tree=tree)
mean(nullK)

## [1] 0.9857

hist(nullK,breaks=20,xlab="null distribution for K",
    main="Null distribution of K")
lines(c(K1,K1),c(0,par()$usr[4]),lty="dashed",col="red")
text(x=K1,y=0.985*par()$usr[4],"observed value of K",pos=4,offset=0.2)

plot of chunk unnamed-chunk-3

The distribution is of K under the null hypothesis is highly assymetric, so it might make sense to conduct a test of K using abs(log(K)). (Just an idea.)

P.k<-mean(abs(log(c(K1,nullK)))>=abs(log(K1)))
P.k

## [1] 0.849

Now let's do the same with the no-signal data:

K2<-phylosig(tree,x2)
K2

## [1] 0.2261

## fit a Brownian model to the data to get parameters
## for simulation
fitBrownian<-brownie.lite(paintSubTree(tree,tree$edge[1,1],state="1"),x2)
X<-fastBM(tree,sig2=fitBrownian$sig2.single,a=fitBrownian$a.single,nsim=999)
nullK<-apply(X,2,phylosig,tree=tree)
mean(nullK)

## [1] 0.9844

hist(nullK,breaks=20,xlab="null distribution for K",
    main="Null distribution of K")
lines(c(K2,K2),c(0,par()$usr[4]),lty="dashed",col="red")
text(x=K2,y=0.985*par()$usr[4],"observed value of K",pos=4,offset=0.2)

plot of chunk unnamed-chunk-5

P.k<-mean(abs(log(c(K2,nullK)))>=abs(log(K2)))
P.k

## [1] 0.001

That's it.

↧

R function for Pagel's 1994 correlation method

December 3, 2014, 2:44 pm

≫ Next: Simulating correlated evolution of discrete characters under Pagel's model

≪ Previous: Testing a hypothesis that phylogenetic signal is different from 1.0

I just posted some code to fit the Pagel (1994) method for detecting “correlated” evolution of two binary traits. I say correlated because it might be more appropriate to call this a dependence - i.e., the rate of character 2 depends on 1 & vice versa.

This is actually quite simple to do because the test is merely a test of, for a two character combination written as [01] (meaning state 0 for the first character & state 1 for the second), the rates of transition from [00]->[01] equal [10]->[11]; [00]->[10] equal [01]->[11]; etc. For this we can just paste our two input characters together, fit both a model in which the dependence exists and one in which it does not, and then compare them. The dependence model has twice as many transition rates as the independence model, so this model will have (for two binary traits) 4 more parameters.

Note that I accomplished this by borrowing ace in the ape package or fitDiscrete in the geiger package to actually fit the models. All my code really does is pre- & post-process the data & results respectively!

Here's a demo:

## first load packages & source code
library(phytools)
library(geiger)
source("fitPagel.R")
.check.pkg<-phytools:::.check.pkg
## now let's simulate some uncorrelated data
tree<-pbtree(n=300,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
tt1<-sim.history(tree,Q)

## Done simulation(s).

tt2<-sim.history(tree,Q)

## Done simulation(s).

## these are uncorrelated, see:
par(mfrow=c(1,2))
plotSimmap(tt1,setNames(c("blue","red"),letters[1:2]),ftype="off",lwd=1)
plotSimmap(tt2,setNames(c("blue","red"),letters[1:2]),ftype="off",lwd=1,direction="leftwards")

plot of chunk unnamed-chunk-1

x<-tt1$states
y<-tt2$states
fit.ape<-fitPagel(tree,x,y)
fit.ape

## 
##   Pagel's binary character correlation test:
## 
## Indepedent model rate matrix:
##         a|a     a|b     b|a     b|b
## a|a -1.9712  1.0896  0.8816  0.0000
## a|b  0.9774 -1.8589  0.0000  0.8816
## b|a  0.7531  0.0000 -1.8426  1.0896
## b|b  0.0000  0.7531  0.9774 -1.7304
## 
## Dependent model rate matrix:
##         a|a     a|b     b|a     b|b
## a|a -1.5606  0.8578  0.7028  0.0000
## a|b  0.9261 -1.8634  0.0000  0.9373
## b|a  0.3652  0.0000 -1.5764  1.2112
## b|b  0.0000  1.2754  1.0567 -2.3321
## 
## Model fit:
##             log-likelihood
## independent         -238.7
## dependent           -237.5
## 
## Hypothesis test result:
##   likelihood-ratio:  2.390211 
##   p-value:  0.6643971

fit.geiger<-fitPagel(tree,x,y,method="fitDiscrete")

## Warning: Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
## Warning: Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41

fit.geiger

## 
##   Pagel's binary character correlation test:
## 
## Indepedent model rate matrix:
##         a|a     a|b     b|a     b|b
## a|a -1.8847  0.9942  0.8906  0.0000
## a|b  0.9699 -1.8605  0.0000  0.8906
## b|a  0.7112  0.0000 -1.7053  0.9942
## b|b  0.0000  0.7112  0.9699 -1.6811
## 
## Dependent model rate matrix:
##         a|a     a|b     b|a     b|b
## a|a -1.5000  0.8192  0.6809  0.0000
## a|b  0.8987 -1.8596  0.0000  0.9609
## b|a  0.3704  0.0000 -1.4820  1.1116
## b|b  0.0000  1.1176  1.1075 -2.2251
## 
## Model fit:
##             log-likelihood
## independent         -239.3
## dependent           -238.3
## 
## Hypothesis test result:
##   likelihood-ratio:  2.096727 
##   p-value:  0.7179738

OK, now let's try correlated data. To do that I will have to build a big transition matrix myself for simulation!

Q<-matrix(c(0,0.5,0.5,0,2,0,0,2,2,0,0,2,0,0.5,0.5,0),4,4,byrow=TRUE)
rownames(Q)<-colnames(Q)<-c("aa","ab","ba","bb")
diag(Q)<--rowSums(Q)
tt<-sim.history(tree,t(Q))

## Note - the rate of substitution from i->j should be given by Q[j,i].
## Done simulation(s).

tt1<-mergeMappedStates(tt,c("aa","ab"),"a")
tt1<-mergeMappedStates(tt1,c("ba","bb"),"b")
tt2<-mergeMappedStates(tt,c("aa","ba"),"a")
tt2<-mergeMappedStates(tt2,c("ab","bb"),"b")
## these data are correlated, see:
par(mfrow=c(1,2))
plotSimmap(tt1,setNames(c("blue","red"),letters[1:2]),ftype="off",lwd=1)
plotSimmap(tt2,setNames(c("blue","red"),letters[1:2]),ftype="off",lwd=1,direction="leftwards")

plot of chunk unnamed-chunk-2

x<-getStates(tt1,"tips")
y<-getStates(tt2,"tips")
fit.ape<-fitPagel(tree,x,y)
fit.ape

## 
##   Pagel's binary character correlation test:
## 
## Indepedent model rate matrix:
##         a|a     a|b     b|a     b|b
## a|a -1.4325  0.9831  0.4493  0.0000
## a|b  0.9593 -1.4086  0.0000  0.4493
## b|a  1.0401  0.0000 -2.0233  0.9831
## b|b  0.0000  1.0401  0.9593 -1.9994
## 
## Dependent model rate matrix:
##        a|a     a|b     b|a    b|b
## a|a -1.043  0.8539  0.1893  0.000
## a|b  5.543 -7.9019  0.0000  2.359
## b|a  1.202  0.0000 -4.4114  3.209
## b|b  0.000  1.0849  0.4054 -1.490
## 
## Model fit:
##             log-likelihood
## independent         -241.1
## dependent           -207.8
## 
## Hypothesis test result:
##   likelihood-ratio:  66.63346 
##   p-value:  1.164753e-13

fit.geiger<-fitPagel(tree,x,y,method="fitDiscrete")

## Warning: Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
## Warning: Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41

fit.geiger

## 
##   Pagel's binary character correlation test:
## 
## Indepedent model rate matrix:
##         a|a    a|b     b|a     b|b
## a|a -1.3892  1.006  0.3832  0.0000
## a|b  0.9236 -1.307  0.0000  0.3832
## b|a  1.0639  0.000 -2.0699  1.0061
## b|b  0.0000  1.064  0.9236 -1.9875
## 
## Dependent model rate matrix:
##         a|a     a|b        b|a    b|b
## a|a -0.8951  0.8951  1.799e-16  0.000
## a|b  5.4502 -7.7907  0.000e+00  2.340
## b|a  1.9288  0.0000 -3.958e+00  2.029
## b|b  0.0000  1.0086  4.105e-01 -1.419
## 
## Model fit:
##             log-likelihood
## independent         -241.9
## dependent           -208.8
## 
## Hypothesis test result:
##   likelihood-ratio:  66.16744 
##   p-value:  1.460391e-13

That's it, I guess. This is brand new so I have no idea how it works but feedback is welcome. I'm also well aware of the recent critiqueof this method by Maddision & Fitzjohn.

↧

Simulating correlated evolution of discrete characters under Pagel's model

December 3, 2014, 9:10 pm

≫ Next: Update to locate.yeti: REML and exact likelihood methods

≪ Previous: R function for Pagel's 1994 correlation method

In a recent post, in which I gave an R function to run Pagel's (1994) test for correlation between two binary traits, I also posted (without much comment) some simple code to simulate the character histories of binary traits undergoingcorrelated evolution by Pagel's model.

Here is that code again with some explanation.

(1) First, just as a preliminary, let's load our packages & simulate a stochastic phylogeny:

library(phytools)

## Loading required package: ape
## Loading required package: maps

tree<-pbtree(n=300,scale=1)

(2) Next, we need to set up a matrix to simultaneously simulate our two traits. Remember, under Pagel's model, character correlation are just transition rates in one character that depend on the state of a second. To simulate under this model we actually have to simultaneously simulate states for the two characters at once. Here, as I did last time, I'll keep it simple. Let's say that the two characters have states a and b. I'll let the transition rates into states a,a or b,b be high (say 2.0), but I will make the transition rates to any state a,b or b,alow - say 0.4. This is what that looks like:

Q<-matrix(c(0,0.4,0.4,0,2,0,0,2,2,0,0,2,0,0.4,0.4,0),4,4,byrow=TRUE)
rownames(Q)<-colnames(Q)<-c("aa","ab","ba","bb")
diag(Q)<--rowSums(Q)
Q

##      aa   ab   ba   bb
## aa -0.8  0.4  0.4  0.0
## ab  2.0 -4.0  0.0  2.0
## ba  2.0  0.0 -4.0  2.0
## bb  0.0  0.4  0.4 -0.8

(3) Now, we can simulate both characters simultaneously up the tree using the phytools function sim.history.

tt<-sim.history(tree,Q)

## Note - the rate of substitution from i->j should be given by Q[j,i].
## Detecting that rows, not columns, of Q sum to zero :
##   Transposing Q for internal calculations.
## Done simulation(s).

(4) This is the history of our two character state data; however we still have to backtranslate this into the character histories for each of our two traits. To do this, we can use mergeMappedStates to merge a,a and a,b (for instance) into a for character 1 and so on. Here is what that looks like:

t1<-mergeMappedStates(tt,c("aa","ab"),"a")
t1<-mergeMappedStates(t1,c("ba","bb"),"b")
t2<-mergeMappedStates(tt,c("aa","ba"),"a")
t2<-mergeMappedStates(t2,c("ab","bb"),"b")
t1$states<-getStates(t1,"tips")
t2$states<-getStates(t2,"tips")

(5) For fun, let's plot the two histories (just as we did last time) so that we can see that they are indeed highly correlated:

par(mfrow=c(1,2))
plotSimmap(t1,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off")
plotSimmap(t2,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off",
    direction="leftwards")

plot of chunk unnamed-chunk-5

(6) Finally, we can fit the Pagel (1994) model:

x<-getStates(t1,"tips")
y<-getStates(t2,"tips")
source("fitPagel.R")
.check.pkg<-phytools:::.check.pkg
fit<-fitPagel(tree,x,y)
fit

## 
##   Pagel's binary character correlation test:
## 
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.9753616  0.5606835  0.4146781  0.0000000
## a|b  2.7557800 -3.1704581  0.0000000  0.4146781
## b|a  2.7479439  0.0000000 -3.3086275  0.5606835
## b|b  0.0000000  2.7479439  2.7557800 -5.5037240
## 
## Dependent model rate matrix:
##            a|a        a|b        b|a       b|b
## a|a -0.7760794  0.4795664  0.2965130  0.000000
## a|b  5.1666938 -7.2422057  0.0000000  2.075512
## b|a  2.0876921  0.0000000 -4.0078768  1.920185
## b|b  0.0000000  3.8020580  0.7652618 -4.567320
## 
## Model fit:
##             log-likelihood
## independent      -186.3963
## dependent        -173.9505
## 
## Hypothesis test result:
##   likelihood-ratio:  24.89146 
##   p-value:  5.290212e-05

That's it. Hopefully this is helpful to someone!

↧

Update to locate.yeti: REML and exact likelihood methods

December 4, 2014, 10:53 am

≫ Next: User control of the vertical range of tip labels in phenogram

≪ Previous: Simulating correlated evolution of discrete characters under Pagel's model

I just posted code for a new version of the function locate.yeti. This function can be used to place recently extinct, missing, or cryptic taxa into an ultrametric phylogeny based on continuous character data.

The main updates are as follows:

(1) The function now includes a REML method which uses the contrasts algorithm of Felsenstein (1985). Note that this option is currently under development.

(2) The function also now includes exact likelihood calculation, rather then the approximate method that used orthogonalization of the original data based on the backbone tree. This obviously slows down computation, and I implemented it primarily to compare to the REML method (which does not need to use orthogonalization). It seems to not be prohibitively slow for relatively modest sized trees (say, 100 taxa or so).

Here's a quick demo.

First, simulate tree & data:

library(phytools)

## Loading required package: ape
## Loading required package: maps

library(phangorn)

## 
## Attaching package: 'phangorn'
## 
## The following object is masked from 'package:ape':
## 
##     as.prop.part

## simulate tree & data
N<-50 ## taxa in base tree
m<-10 ## number of continuous characters
## simulate tree
tt<-tree<-pbtree(n=N+1,tip.label=sample(c(paste("t",1:N,sep=""),"Yeti")))
## generate a covariance matrix for simulation
L<-matrix(rnorm(n=m*m),m,m)
L[upper.tri(L,diag=FALSE)]<-0
L<-L-diag(diag(L))+abs(diag(diag(L)))
V<-L%*%t(L)
X<-sim.corrs(tree,vcv=V)
tree<-drop.tip(tree,"Yeti")
## visualize trees
par(mfrow=c(1,2))
plotTree(tt,mar=c(0.1,0.1,4.1,0.1),fsize=0.8)
title("tree with Yeti")
plotTree(tree,mar=c(0.1,0.1,4.1,0.1),direction="leftwards",fsize=0.8)
title("tree without Yeti")

plot of chunk unnamed-chunk-1

OK, now let's estimate using the ML method with & without rotation:

## without rotation
mltree<-locate.yeti(tree,X,plot=TRUE,search="exhaustive",rotate=FALSE)

## Optimizing the phylogenetic position of Yeti using ML. Please wait....

plot of chunk unnamed-chunk-2

## Done.

mltree.rotate<-locate.yeti(tree,X,plot=TRUE,search="exhaustive")

## Optimizing the phylogenetic position of Yeti using ML. Please wait....

plot of chunk unnamed-chunk-2

## Done.

mltree$logL

## [1] -455.4

mltree.rotate$logL

## [1] -456.3

RF.dist(mltree,mltree.rotate)

## [1] 0

Hopefully, we get more or less the same tree in both cases!

Now, let's try the REML method. We can compare our results to the true tree and to our ML tree from above:

remltree<-locate.yeti(tree,X,method="REML",plot=TRUE,search="exhaustive")

## ---------------------------------------------------------------
## | **Warning: method="REML" has not been thoroughly tested.    |
## |   Use with caution.**                                       |
## ---------------------------------------------------------------
## 
## Optimizing the phylogenetic position of Yeti using REML. Please wait....

plot of chunk unnamed-chunk-3

## Done.

RF.dist(tt,remltree)

## [1] 4

RF.dist(mltree,remltree)

## [1] 0

In addition to the RF distance, let's compute the patristic distances on the ML and REML trees. Note that it doesn't mean much that most of these are precisely the same - these are distances that don't involve the unknown tip. We should instead compare the very few outliers that involve distances from our unknown tip to other taxa in the tree:

par(mfrow=c(1,2))
plot(cophenetic(tt)[tt$tip.label,tt$tip.label],
    cophenetic(mltree)[tt$tip.label,tt$tip.label],
    xlab="true distances",ylab="ML tree distances")
plot(cophenetic(tt)[tt$tip.label,tt$tip.label],
    cophenetic(remltree)[tt$tip.label,tt$tip.label],
    xlab="true distances",ylab="REML tree distances")

plot of chunk unnamed-chunk-4

OK, well, that's all I have. This update, along with the new phytools function fitPagel, is in a new non-CRAN phytools version (phytools 0.4-40) that I recently posted. Let me know if you run into any difficulties with either function.

↧

User control of the vertical range of tip labels in phenogram

December 4, 2014, 7:15 pm

≫ Next: Fossil version of locate.yeti to place taxa in a tree using continuous characters

≪ Previous: Update to locate.yeti: REML and exact likelihood methods

A phytools user asked the following:

“I am plotting a phenogram (aka traitgram) of my tree, but it has many tips and my labels are overlapping. I have tried spread.labels=TRUE and spread.cost=c(1,0) and changed fsize, but some labels still overlap. I saw that the internal function spreadlabels sets the range for the spread labels to the range of my trait. Is it possible to expand that range to accommodate a large tree with many overlapping labels.”

The answer is both yes & no. Yes, it is possible to allow user control of the vertical range of tip labels - and I have coded this up and posted it here, as well as in a new non-CRAN phytools version (phytools 0.4-41). Unfortunately, it is also no in that this will usually not solve the problem that the user has identified. This is because the vertical range for tip labels is ultimately limited not by the scale of the plot - but by the size of the plotting device. The plotting device is roughly analogous to the “piece of paper” we are drawing our plot on. We can rescale our y-axis all we like, but ultimately, to have more space for our labels, we're going to need a bigger piece of paper!

There are some conditions, for instance when ancestral states are known, that drawing the tip labels on a broader range of y than the extant species values would be advantageous, so I have nonetheless added the feature. It also may help a bit if we just need a small amount of additional vertical space to fit out labels. However it is not a panacea for the problem of overlapping tip labels.

Here's a quick demo that illustrates what I mean:

library(phytools)
set.seed(1)
tree<-pbtree(n=40)
## realistic tip labels
tree$tip.label<-replicate(Ntip(tree),paste(sample(LETTERS,1),". ",
    paste(sample(letters,round(runif(1,3,8))),collapse=""),sep=""))
x<-fastBM(tree)
## original range
phenogram(tree,x,spread.labels=TRUE,spread.cost=c(1,0),ftype="i")

plot of chunk unnamed-chunk-1

## 'stretched' range
phenogram(tree,x,spread.labels=TRUE,spread.cost=c(1,0),ftype="i",
    spread.range=c(-5,5),ylim=c(-5.1,5.1))

plot of chunk unnamed-chunk-1

So you can see that although I have “expanded” the vertical space over which tip labels are written, without a bigger plotting device, my tip labels are equally overlapping.

Now let's use a bigger piece of paper:

phenogram(tree,x,spread.labels=TRUE,spread.cost=c(1,0),ftype="i")

plot of chunk unnamed-chunk-2

Much better! That's it.

↧

Fossil version of locate.yeti to place taxa in a tree using continuous characters

December 7, 2014, 4:27 pm

≫ Next: Inverting the color map on an object of class "contMap" or "densityMap"

≪ Previous: User control of the vertical range of tip labels in phenogram

I have now posted development codefor a fossil version of the phytools function locate.yeti, called locate.fossil.

This function uses ML to place a single fossil tip into a tree based on continuous character data. It allows the user to impose a time.constraint, in the form of a time range for the fossil. If the age of the fossil is known precisely, this can just be a single numeric value; and if the user wants to supply it (rather than as a height above the root) as a time before present, this should be given as a negative scalar. Note that time.constraint constrains only the height of the tip, and not at all the height of the attachment of the terminal edge to the tree - except inasmuch as this must precede in time the height of the tip of course! The user can also supply an edge.constraint which constrains the edges of the tree to which the tip can attach. Obviously, these must be compatible with time.constraint or the function will not work.

Here's a demo of estimation using simulated data:

library(phytools)
library(mnormt)
## here is our true tree (normally unknown)
plotTree(true.tree,ftype="i",fsize=0.8)

plot of chunk unnamed-chunk-1

## simulate data
X<-fastBM(true.tree,nsim=20)
## drop tip to recreate the empirical backbone tree
tree<-drop.tip(true.tree,"fossil")
plotTree(tree,ftype="i",fsize=0.8)

plot of chunk unnamed-chunk-1

## now let's make an age constraint based on the real (known) age of 
## the fossil lineage:
h<-nodeHeights(true.tree)[true.tree$edge[,2]==which(true.tree$tip.label=="fossil"),2]
max(nodeHeights(tree))-h

## [1] 30.14439

age.range<-c(h-5,h+5)
age.range

## [1] 64.85561 74.85561

## load source code for locate.fossil
source("locate.fossil.R")
mltree<-locate.fossil(tree,X,time.constraint=age.range,plot=FALSE)

## Optimizing the phylogenetic position of fossil using ML. Please wait....
## Done.

## plot the result
plotTree(mltree,mar=c(2.1,0.1,3.1,0.1),ftype="i",fsize=0.8)
axis(1)
obj<-lapply(age.range,function(x,tree) lines(rep(x,2),
    c(0,Ntip(tree)+1),col="red",lty="dashed"),tree=tree)
title(paste("Optimized position of taxon \"",setdiff(rownames(X),tree$tip.label),
"\"",sep=""))

plot of chunk unnamed-chunk-1

The function also contains an option to plot the progress of the likelihood search. This is somewhat difficult to illustrate on the web. (I attempted to use saveGIF from the 'animation' package, but was having trouble for some unknown reason.)

That's it for now.

↧

Inverting the color map on an object of class "contMap" or "densityMap"

December 7, 2014, 7:28 pm

≫ Next: Adding arrows to a plotted radial tree

≪ Previous: Fossil version of locate.yeti to place taxa in a tree using continuous characters

In response to a recent user commentI have just posted code to automatically flip or invert the color map in an object of class "contMap" or "densityMap". The function is very simple, as follows:

setMap<-function(x,...){
    if(hasArg(invert)) invert<-list(...)$invert
    else invert<-FALSE
    n<-length(x$cols)
    if(invert) x$cols<-setNames(rev(x$cols),names(x$cols))
    else x$cols[1:n]<-colorRampPalette(...)(n)
    x
}

Here is how it works:

library(phytools)

## Loading required package: ape
## Loading required package: maps
## 
## Attaching package: 'phytools'
## 
## The following object is masked _by_ '.GlobalEnv':
## 
##     setMap

tree<-pbtree(n=26,tip.label=LETTERS)
x<-fastBM(tree)
obj<-contMap(tree,x)

plot of chunk unnamed-chunk-2

plot(setMap(obj,invert=TRUE))

plot of chunk unnamed-chunk-2

Remember, setMap can also be used to change the map arbitrarily, such as in:

obj<-setMap(obj,colors=c("white","black"))
plot(obj)

plot of chunk unnamed-chunk-3

That's it.

↧

Adding arrows to a plotted radial tree

December 14, 2014, 4:42 pm

≫ Next: New (non-CRAN) version of phytools; more on new phytools function add.arrow

≪ Previous: Inverting the color map on an object of class "contMap" or "densityMap"

I just wrote a function to automatically add an arrow to a radial ("fan") tree plot. This could be adapted to trees of different types of course. The purpose of it is to add arrows that point to specific tips or nodes. For aesthetic reasons, we'd like the angle of the arrow to match the angle of the terminal edge of the tree leading to our target leaf.

Here is what my code looks like:

add.arrow<-function(tree=NULL,tip,...){
    lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
    if(is.numeric(tip)){
        ii<-tip
        if(!is.null(tree)&&ii<=length(tree$tip.label)) tip<-tree$tip.label[ii]
        else tip<-""
    } else if(is.character(tip)&&!is.null(tree)) ii<-which(tree$tip.label==tip)
    strw<-lastPP$cex*(strwidth(tip)+strwidth("W"))
    if(hasArg(arrl)) arrl<-list(...)$arrl
    else { 
        if(lastPP$type=="fan") arrl<-0.3*max(lastPP$xx)
        else if(lastPP$type=="phylogram") arrl<-0.15*max(lastPP$xx)
    }
    if(hasArg(hedl)) hedl<-list(...)$hedl
    else hedl<-arrl/3
    if(hasArg(angle)) angle<-list(...)$angle
    else angle<-45
    arra<-angle*pi/180
    if(hasArg(col)) col<-list(...)$col
    else col<-"black"
    if(hasArg(lwd)) lwd<-list(...)$lwd
    else lwd<-2
    if(lastPP$type=="fan"){
        theta<-atan2(lastPP$yy[ii],lastPP$xx[ii])
        segments(x0=lastPP$xx[ii]+cos(theta)*(strw+arrl),
            y0=lastPP$yy[ii]+sin(theta)*(strw+arrl),
            x1=lastPP$xx[ii]+cos(theta)*strw,
            y1=lastPP$yy[ii]+sin(theta)*strw,
            col=col,lwd=lwd,lend="round")
        segments(x0=lastPP$xx[ii]+cos(theta)*strw+cos(theta+arra/2)*hedl,
            y0=lastPP$yy[ii]+sin(theta)*strw+sin(theta+arra/2)*hedl,
            x1=lastPP$xx[ii]+cos(theta)*strw,
            y1=lastPP$yy[ii]+sin(theta)*strw,
            col=col,lwd=lwd,lend="round")
        segments(x0=lastPP$xx[ii]+cos(theta)*strw+cos(theta-arra/2)*hedl,
            y0=lastPP$yy[ii]+sin(theta)*strw+sin(theta-arra/2)*hedl,
            x1=lastPP$xx[ii]+cos(theta)*strw,
            y1=lastPP$yy[ii]+sin(theta)*strw,
            col=col,lwd=lwd,lend="round")
    }
}

Note that I use three calls to segments rather than more obvious graphics function arrows. This is because I was unhappy with the options for user control of the dimensions of the arrow head in arrows, so I ensentially built my own version.

And now I'll use it to add arrows to an anole tree plotted in the "contMap" style.

library(phytools)
obj

## Object of class "contMap" containing:
## 
## (1) A phylogenetic tree with 101 tips and 100 internal nodes.
## 
## (2) A mapped continuous trait on the range (-3.072659, 4.751288).

plot(obj,type="fan",lwd=3,fsize=c(0.8,1),legend=0.7,
    outline=TRUE)
add.arrow(tree=obj$tree,tip="roosevelti",col="red",lwd=3)
add.arrow(tree=obj$tree,tip="cuvieri",col="blue",lwd=3)

plot of chunk unnamed-chunk-2

(The aliasing that you see in this figure - pixelation on diagonal lines - can be avoided by exporting to a PDF or other vector format from R. I highly advise this if you intend to use this or other phytools plotting functions in publication!)

This is for a paper in which we use locate.yeti to place the little known & possibly extinct taxon Anolis roosevelti into the Caribbean Anolis phylogeny. Note that the ML placement (sister to the clade containing A. equestris is not where we a priori would've expected it to belong, that is, as sister to A. cuvieri; however nor can we statistically reject that placement. For more on that, see our paper - which is in revision, but we hope will ultimately be accepted and come out next year.

↧

New (non-CRAN) version of phytools; more on new phytools function add.arrow

December 15, 2014, 10:07 am

≫ Next: Wrapper function to optimize the λ tree transformation for a discrete trait

≪ Previous: Adding arrows to a plotted radial tree

I just posted a new non-CRAN version of phytools (go here and pick the latest version, which can be installed from source). I'm working towards an update to phytools on CRAN; but I need to do some more debugging of phytools' new functions, such as locate.yeti and locate.fossil, as well as add some details to new documentation, before I do that.

This latest version now includes the aforementioned locate.fossil. It also includes fitPagel, a relatively simple wrapper function that implements Pagel's 1994 method to test for correlated evolution of two binary characters, as well as the brand new function add.arrow, which adds an arrow to a plotted radial or square phylogram.

Here's a quick demo of the last function in this list:

library(phytools)
packageVersion("phytools")

## [1] '0.4.42'

## simulate tree with realistic tip labels
set.seed(10)
tree<-rtree(n=26)
tree$tip.label<-paste(LETTERS[26:1],"._",sep="",
    replicate(26,paste(sample(letters,sample(3:12,1)),
    collapse="")))
h<-max(nodeHeights(tree))
plotTree(tree,ftype="i")
add.arrow(tree=tree,tip=tree$tip.label[12],angle=40,arrl=0.1*h)
add.arrow(tree=tree,tip=tree$tip.label[6],angle=40,arrl=0.1*h,col="red")
## we can do it without supplying a tree as a function argument
add.arrow(tip=23,offset=8,angle=40,arrl=0.1*h,col="green")
## we can have an arrow point to a node
add.arrow(tip=41,offset=0.5,angle=40,arrl=0.1*h)

plot of chunk unnamed-chunk-1

Of course, as described in the original postthe function also works for radial (type="fan") trees, e.g.:

tree<-pbtree(n=64)
tree$tip.label<-paste(sample(LETTERS,64,replace=TRUE),"._",sep="",
    replicate(26,paste(sample(letters,sample(3:12,1)),
    collapse="")))
h<-max(nodeHeights(tree))
plotTree(tree,type="fan",ftype="i")

## setEnv=TRUE for this type is experimental. please be patient with bugs

add.arrow(tree,tip="D._idemqv",col="red")
add.arrow(tree,tip="U._fjgchek",col="blue")

plot of chunk unnamed-chunk-2

And we can use it for plotted objects of class "contMap" and "densityMap", e.g.:

x<-fastBM(tree)
obj<-contMap(tree,x,plot=FALSE)
plot(obj,type="fan",legend=2)
add.arrow(tree,tip="D._idemqv",col="red")
add.arrow(tree,tip="U._fjgchek",col="blue")

plot of chunk unnamed-chunk-3

That's it for now.

↧

Wrapper function to optimize the λ tree transformation for a discrete trait

December 26, 2014, 12:10 pm

≫ Next: Arbitrary vertical spacing of the tips on a plotted tree

≪ Previous: New (non-CRAN) version of phytools; more on new phytools function add.arrow

Today I respondedto an R-sig-phylo query by posting some code to optimize Pagel's λ tree transformation for a discrete character evolving by a continuous-time Markov chain. I did this by writing a very simple wrapper around ape's ace function for ancestral character estimation, which also fits this model:

## here's the wrapper function
fitLambda<-function(tree,x,model="ER"){
    lik<-function(lambda,tree,x,model)
    logLik(ace(x,rescale(tree,model="lambda",lambda),
        type="discrete",model=model))
    obj<-optimize(lik,c(0,1),tree=tree,x=x,model=model,maximum=TRUE)
    fit<-ace(x,rescale(tree,model="lambda",lambda=obj$maximum),
        type="discrete",model=model)
    I<-fit$index.matrix
    fitted.Q=matrix(fit$rates[I],dim(I)[1],dim(I)[2],
        dimnames=list(dimnames(fit$lik.anc)[[2]],
        dimnames(fit$lik.anc)[[2]]))
    diag(fitted.Q)<--rowSums(fitted.Q,na.rm=TRUE)
    list(Q=fitted.Q,lambda=obj$maximum,logLik=logLik(fit))
}
library(geiger)
library(phytools)
## simulate some data to test it
tree<-pbtree(n=200,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-sim.history(rescale(tree,model="lambda",lambda=0.7),Q)$states

## Done simulation(s).

fitLambda(tree,x)

## $Q
##         a       b
## a -0.7995  0.7995
## b  0.7995 -0.7995
## 
## $lambda
## [1] 0.5154
## 
## $logLik
## [1] -128.6

Note that the same model can also be fit using geiger's fitDiscretefunction, but the poster reported some issues with convergence which made me want to post the second way. Here is fitDiscrete

fitDiscrete(tree,x,transform="lambda")

## DLSODA-  Warning..Internal T (=R1) and H (=R2) are
##       such that in the machine, T + H = T on the next step
##      (H = step size). Solver will continue anyway.
## In above message, R1 = 0, R2 = 0
##  
## DINTDY-  T (=R1) illegal
## In above message, R1 = 2.8036e-222
##  
##       T not in interval TCUR - HU (= R1) to TCUR (=R2)
## In above message, R1 = 0, R2 = 0
##  
## DINTDY-  T (=R1) illegal
## In above message, R1 = 8.66551e-221
##  
##       T not in interval TCUR - HU (= R1) to TCUR (=R2)
## In above message, R1 = 0, R2 = 0
##  
## DLSODA-  Trouble in DINTDY.  ITASK = I1, TOUT = R1
## In above message, I1 = 1
##  
## In above message, R1 = 8.66551e-221
##

Here I've excluding a whole bunch of error messages....

##  
## In above message, R1 = 8.66551e-221
##

## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##             a       b
##     a -0.7928  0.7928
##     b  0.7928 -0.7928
## 
##  fitted 'lambda' model parameter:
##  lambda = 0.508644
## 
##  model summary:
##  log-likelihood = -128.569446
##  AIC = 261.138891
##  AICc = 261.199805
##  free parameters = 2
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 56
##  frequency of best fit = NA
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

Also, fitting the λ model to discrete trait data has always struck me as a somewhat peculiar enterprise, so this is posted without any prejudice towards whether this is a good idea, a bad idea, or an average idea in the first place!

↧

Arbitrary vertical spacing of the tips on a plotted tree

January 30, 2015, 5:04 pm

≫ Next: Bug fix for pbtree with user-supplied tip labels and extinction

≪ Previous: Wrapper function to optimize the λ tree transformation for a discrete trait

Today a R-sig-phylo reader asked the following:

I am looking for a way to manually specify varying vertical distance between the tips in a phylogeny (because i want to add several lines of text to some of the tips).

I.e. for the tree library(ape) TREE=read.tree(text=“((Tip_1:1,Tip_2:1):1,Tip_3:2);)”)

Is there a way for instance through plot phylo to print place the tips in the heights of c(1,2,6) as opposed to c(1,2,3) as they would be normally in plot.phylo.

This is not presently possible, but I realized immediately that this would be pretty straightforward to add to phytools plotSimmap because the way that functions works is by first assigning 1:Nto the N tips of the tree, and then works backwards to assign the vertical position of all internal nodes via a post-order traversal.

I respondedto the query, and now I have also postedcode online that permits exactly this type of manipulation. The new phytools build with this update can be obtained here. Here's a demo using the code of the original query:

library(ape)
TREE=read.tree(text="((Tip_1:1,Tip_2:1):1,Tip_3:2);")
library(phytools)
packageVersion("phytools")

## [1] '0.4.43'

## normal vertical spacing
plotTree(TREE)

plot of chunk unnamed-chunk-1

## modified spacing
tips<-setNames(c(1,2,6),TREE$tip.label)
tips

## Tip_1 Tip_2 Tip_3 
##     1     2     6

plotTree(TREE,tips=tips)

plot of chunk unnamed-chunk-1

Note that there is nothing here to prevent us from plotting trees with line crossing! For instance:

tree<-pbtree(n=26,tip.label=LETTERS)
tips<-setNames(1:26,sample(LETTERS))
plotTree(tree,tips=tips)

plot of chunk unnamed-chunk-2

Yikes!

We can also do other weird stuff, like this:

plotTree(tree,tips=setNames(log(1:26),tree$tip.label),ftype="off")

plot of chunk unnamed-chunk-3

plotTree(tree,tips=setNames((1:26)^2,tree$tip.label),ftype="off")

plot of chunk unnamed-chunk-3

Well, you get the idea.

That's it.

↧

Bug fix for pbtree with user-supplied tip labels and extinction

February 1, 2015, 3:28 pm

≫ Next: New phytools version (phytools 0.4-45) submitted to CRAN

≪ Previous: Arbitrary vertical spacing of the tips on a plotted tree

I just posteda small bug fix in the phytools function pbtree.

pbtree (originally short for Pure-Birth tree) does birth-death phylogeny simulations for a variety of conditions (time-stop, taxon-stop, both together via rejection, and discrete & continuous time simulation). If the taxon-stop criterion is being used (that is, if the function is being told to return a tree with a specific number of extantterminal taxa), then the user is also allowed to supply his or her own tip labels which will be used to populated tree$tip.labelin the returned tree. Since a stochastic birth-death tree with N extant tips can have an arbitrary and unknowable number of extinct tips, these are labeled X1, X2, etc. For instance:

library(phytools)
tree<-pbtree(n=26,b=1,d=0.3,tip.label=letters)

## Warning: only using labels in tip.label for extant tips.
##          extinct tips will be labeled X1, X2, etc.

plotTree(tree,ftype="i",fsize=0.9)

plot of chunk unnamed-chunk-1

Unfortunately, the way this works is by first simulating the tree with arbitrary internal labels, and then assigning all N extant tips the user supplied labels after running the phytools function getExtant internally. Due to numerical precision issues, though, if the total tree length is very long (for instance, because band d are small for a given taxon-stop, n), then (unless its tolerance argument is adjusted) getExtant will malfunction and pbtree will behave as if the entire tree consists of extinct taxa! So, for example:

set.seed(10)
tree<-pbtree(n=26,b=0.001,d=0.0003,tip.label=letters)

## Warning: only using labels in tip.label for extant tips.
##          extinct tips will be labeled X1, X2, etc.

plotTree(tree,ftype="i",fsize=0.9)

plot of chunk unnamed-chunk-2

Clearly, 26 tips are extant in this tree; however the function is behaving as if they are all extinct!

The 'fix' that I've applied is to make the tolerance value in getExtant a function of the total tree length, rather the default value of getExtant.

detach("package:phytools",unload=TRUE)
install.packages("phytools_0.4-44.tar.gz",type="source")

## Installing package into 'C:/Users/Liam/Documents/R/win-library/3.1'
## (as 'lib' is unspecified)
## inferring 'repos = NULL' from 'pkgs'

library(phytools)
packageVersion("phytools")

## [1] '0.4.44'

set.seed(10)
tree<-pbtree(n=26,b=0.001,d=0.0003,tip.label=letters)

## Warning: only using labels in tip.label for extant tips.
##          extinct tips will be labeled X1, X2, etc.

plotTree(tree,ftype="i",fsize=0.9)

plot of chunk unnamed-chunk-3

The updated version of phytools is here. I'm also planning to submit a new version to CRAN in the not-too-distant future. (Fingers crossed!)

That's it!

↧

New phytools version (phytools 0.4-45) submitted to CRAN

February 4, 2015, 9:43 am

≫ Next: Reproducible analysis code for our forthcoming Anolis roosevelti (aka. locate.yeti) Evolution paper

≪ Previous: Bug fix for pbtree with user-supplied tip labels and extinction

This afternoon I submitted a new version of phytools to CRAN. Assuming it is accepted, the new version (phytools 0.4-45), which is already available from the phytools page, should percolate through all of the CRAN mirror repositories over the next few days.

This version does not feature any significant updates over the most recent non-CRAN phytools release (although I did do some updating of the manual pages this morning before submitting); however the last CRAN update was August 26th of last year (phytools 0.4-31) so there are many changes and updates to phytools since that release.

Some updates and additions include the following (and I can't guarantee that this list is comprehensive):

(1) A faster version of the robust Newick tree reader read.newick (1, 2).

(2) A new function (ladderize.simmap) to ladderize a phylogeny with a mapped discrete character.

(3) An update to the discrete character history simulating function, sim.history, to handle errors more sensibly.

(4) A small update to the fancyTree method "phenogram95"to permit user control of shading.

(5) Replacement of the redundant phytools function repPhylo with a valid alias.

(6) A fixfor a very mysterious bug in plot.contMap and plot.densityMap(not by coincidence - these two functions use the same internals).

(7) A wrapper function implementating Pagel's (1994) method to test for correlated evolution of two binary traits. (Not in phytools, but I posted code for simulating binary character correlated evolution here.)

(8) An update to locate.yeti implementing exact ML and REML estimation methods.

(9) User control of the vertical range of labels in phenogram. (Turns out this isn't particularly useful.)

(10) A fossil version of locate.yeti, called, creatively, locate.fossil.

(11) A new version of the phytools helper function setMap that can automatically invert the color map of an object of class "contMap" or "densityMap".

(12) A new function add.arrow that will add an arbitarily colored arrow to a radialor horizontalphylogram.

(13) A new option in plotSimmap (and thus plotTree) that permits arbitrary vertical spacing of tips in a right or left-facing square phylogram.

And, finally, (14) a bug fix for pbtree with user-supplied tip labels (& non-zero extinction).

As noted earlier, phytools 0.4-45 is already available from the phytools webpage, but, if all goes smoothly, should also be available from CRAN in the not-too-distant future.

Thanks for paying attention!

↧

Reproducible analysis code for our forthcoming Anolis roosevelti (aka. locate.yeti) Evolution paper

February 7, 2015, 4:21 pm

≫ Next: Bug fix in plotTree.singletons to permit multifurcating nodes

≪ Previous: New phytools version (phytools 0.4-45) submitted to CRAN

The following is the reproducible analysis code used in our 'in press' Evolution article in which we present a method for placing recently extinct taxa on a phylogeny using continuous character data and apply it to the case of Anolis roosevelti, a giant anole from the Puerto Rican bank Virgin Islands that is most likely extinct. The paper should be out in the first half of the year, but this will be posted to Dryad soon with our data.

------------------------------

Analyis & code for “Placing cryptic, recently extinct, or hypothesized taxa into an ultrametric phylogeny using continuous character data: A case study with the lizard Anolis roosevelti”

First load packages. Note that use of Rphylip also requires that the PHYLIP package be installed. This is used to compute the 'branch-score' distances of Kuhner & Felsenstein (1994).

set.seed(1)
library(phytools)
library(Rphylip)
library(phangorn)
library(clusterGeneration)

Next, here is our wrapper function for simulation & analysis:

foo<-function(N,m,nsim){
    cat("simulation conditions:\n")
    cat(paste("\tN =",N,"\n\tm =",m,"\n\tnsim=",nsim,"\n"))
    ## vectors for results
    rf<-bs<-vector()
    for(i in 1:nsim){
        ## simulate tree
        tt<-tree<-pbtree(n=N+1,tip.label=sample(c(paste("t",1:N,sep=""),"Yeti")),scale=1)
        if(m>1){
            ## generate a covariance matrix for simulation
            V<-genPositiveDefMat(m,covMethod="unifcorrmat")$Sigma
            X<-sim.corrs(tree,vcv=V)
        } else X<-as.matrix(fastBM(tree))
        ## prune unknown taxon "Yeti" from tree
        tree<-drop.tip(tree,"Yeti")
        ## run locate.yeti
        mltree<-locate.yeti(tree,X,plot=FALSE,search="exhaustive",quiet=TRUE)
        rf[i]<-RF.dist(tt,mltree)
        bs[i]<-Rtreedist(tt,trees2=mltree,quiet=TRUE)
        cat(".")
    }
    cat("\nDone.\n")
    return(list(rf=rf,bs=bs))
}

Now let's run our simulation showing the result for various numbers of taxa given a fixed number of traits:

m<-10 ## number of traits (for fixed number of traits)
nsim<-100 ## number of simulation
N<-seq(20,100,by=10)
objN<-lapply(N,foo,m=m,nsim=nsim)

## simulation conditions:
##  N = 20 
##  m = 10 
##  nsim= 100 
## ..............................................................

## Warning: running command 'rm outfile' had status 1

## ......................................
## Done.
## simulation conditions:
##  N = 30 
##  m = 10 
##  nsim= 100 
## ....................................................................................................
## Done.
## simulation conditions:
##  N = 40 
##  m = 10 
##  nsim= 100 
## .................................................................................

## Warning: running command 'rm outfile' had status 1

## ...................
## Done.
## simulation conditions:
##  N = 50 
##  m = 10 
##  nsim= 100 
## .............................................

## Warning: running command 'rm outfile' had status 1

## .......................................................
## Done.
## simulation conditions:
##  N = 60 
##  m = 10 
##  nsim= 100 
## ..........................................................................................

## Warning: running command 'rm intree' had status 1

## ..........
## Done.
## simulation conditions:
##  N = 70 
##  m = 10 
##  nsim= 100 
## .....................

## Warning: running command 'rm intree' had status 1

## ...............................................................................
## Done.
## simulation conditions:
##  N = 80 
##  m = 10 
##  nsim= 100 
## ....................................................................................................
## Done.
## simulation conditions:
##  N = 90 
##  m = 10 
##  nsim= 100 
## ............................

## Warning: running command 'rm intree2' had status 1

## ........................................................

## Warning: running command 'rm outfile' had status 1

## ................
## Done.
## simulation conditions:
##  N = 100 
##  m = 10 
##  nsim= 100 
## ..................................................................

## Warning: running command 'rm outfile' had status 1

## ..................................
## Done.

Next, we have a function to generate a null distribution of tree distances as if the missing taxon were attached randomly in our tree. Here is what that code looks like:

null.treedist<-function(N,nsim){
    tt<-pbtree(n=(N+1),nsim=nsim,tip.label=c(paste("t",1:N,sep=""),"Yeti"),scale=1)
    trees<-lapply(tt,drop.tip,tip="Yeti")
    trees<-lapply(trees,add.random,tips="Yeti")
    class(trees)<-"multiPhylo"
    rf<-mapply(RF.dist,tt,trees)
    bs<-Rtreedist(tt,trees2=trees,distances="corresponding",quiet=TRUE)
    list(rf=rf,bs=bs)
}

We can use it to recreate panels A & B of Figure 3, showing a comparison between locate.yeti and random placement of the missing tip:

## generate null trees
nullN<-lapply(N,null.treedist,nsim=nsim)
layout(matrix(c(1,2),1,2,byrow=TRUE))
par(mar=c(4.1,4.1,3.1,1.1))
## plot RF distances
nullRF<-sapply(nullN,function(x) x$rf)
colnames(nullRF)<-N
par(fg=grey(0.2,0.5))
boxplot(nullRF,col=grey(0.5,0.2),ylab="R-F distance",xlab="number of taxa (N-1)",ylim=c(0,40))
par(fg="black")
RF<-sapply(objN,function(x) x$rf)
colnames(RF)<-N
boxplot(RF,add=TRUE)
title("A                                                             ")
## plot branch-score distances
nullBS<-sapply(nullN,function(x) x$bs)
colnames(nullBS)<-N
par(fg=grey(0.2,0.5))
boxplot(nullBS,col=grey(0.5,0.2),ylab="B-S distance",xlab="number of taxa (N-1)",ylim=c(0,2.5))
par(fg="black")
BS<-sapply(objN,function(x) x$bs)
colnames(BS)<-N
boxplot(BS,add=TRUE)
title("B                                                             ")

plot of chunk unnamed-chunk-5

Now, let's do the same analysis, but this time vary the number of traits:

set.seed(1)
N<-50 ## number of taxa (for fixed number of taxa)
m<-c(1,2,5,10,20) ## number of traits
objM<-lapply(m,foo,N=N,nsim=nsim)

## simulation conditions:
##  N = 50 
##  m = 1 
##  nsim= 100 
## ....................................................................................................
## Done.
## simulation conditions:
##  N = 50 
##  m = 2 
##  nsim= 100 
## ........................................................................

## Warning: running command 'rm intree2' had status 1

## ............................
## Done.
## simulation conditions:
##  N = 50 
##  m = 5 
##  nsim= 100 
## ....................................................................

## Warning: running command 'rm outfile' had status 1

## ................................
## Done.
## simulation conditions:
##  N = 50 
##  m = 10 
##  nsim= 100 
## ......................................................

## Warning: running command 'rm outfile' had status 1

## ..............................................
## Done.
## simulation conditions:
##  N = 50 
##  m = 20 
##  nsim= 100 
## ..............................................

## Warning: running command 'rm outfile' had status 1

## .................................

## Warning: running command 'rm outfile' had status 1

## .....................
## Done.

Now we can recreate panels C & D of Figure 3, showing a comparison between locate.yeti and random placement of the missing tip, for various numbers of traits:

layout(matrix(c(1,2),1,2,byrow=TRUE))
par(mar=c(4.1,4.1,3.1,1.1))
nullM<-nullN[[which(N==50)]]
nullRF<-matrix(rep(nullM$rf,length(m)),100,length(m),byrow=FALSE)
colnames(nullRF)<-m
par(fg=grey(0.2,0.5))
boxplot(nullRF,col=grey(0.5,0.2),ylab="R-F distance",xlab="number of characters (m)",ylim=c(0,40))
par(fg="black")
RF<-sapply(objM,function(x) x$rf)
colnames(RF)<-m
boxplot(RF,add=TRUE)
title("C                                                             ")
nullBS<-matrix(rep(nullM$bs,length(m)),100,length(m),byrow=FALSE)
colnames(nullBS)<-m
par(fg=grey(0.2,0.5))
boxplot(nullBS,col=grey(0.5,0.2),ylab="B-S distance",xlab="number of characters (m)",ylim=c(0,2.5))
par(fg="black")
BS<-sapply(objM,function(x) x$bs)
colnames(BS)<-m
boxplot(BS,add=TRUE)
title("D                                                             ")

plot of chunk unnamed-chunk-7

tree<-read.tree("Revell-etal.tree.tre")
X<-read.csv("Revell-etal.data.csv",header=T,row.names=1)
X<-as.matrix(X)
X<-X[,1:20]
tip<-setdiff(rownames(X),tree$tip.label)
tip

## [1] "roosevelti"

mltree<-locate.yeti(tree,X,search="exhaustive",plot=TRUE)

## Optimizing the phylogenetic position of roosevelti using ML. Please wait....

plot of chunk unnamed-chunk-8

## Done.

mltree$logL

## [1] 3392

Now we can create the contMap style plot of Figure 4.

mltree<-reorder(reorder(mltree,"pruningwise")) ## reorder hack
pca<-phyl.pca(mltree,X)
## flip PC 1 so that it loads positively (not negatively) on size
obj<-contMap(mltree,-pca$S[,1],plot=FALSE)
class(obj)<-"densityMap"
## fix tip label
obj$tree$tip.label[which(obj$tree$tip.label=="pumilis")]<-"pumilus"
## plot
plot(obj,type="fan",lwd=4,legend=0.7,
    leg.txt=c(round(obj$lims[1],3),"PC 1",round(obj$lims[2],3)),
    outline=TRUE)
## add arrows
add.arrow(tree=obj$tree,tip="roosevelti",col="red",lwd=3,hedl=0.06,angle=50)
add.arrow(tree=obj$tree,tip="cuvieri",col="blue",lwd=3,hedl=0.06,angle=50)

plot of chunk unnamed-chunk-9

Finally, we can do the analyses to test the (null) hypotheses that A. rooseveltiis siste to A. cuvieri, or sister-to or nested-within the majority of the other Puerto Rican anoles:

## find ML cuvieri-constraint tree
cuvieri<-which(tree$tip.label=="cuvieri")
cuvieri.tree<-locate.yeti(tree,X,search="exhaustive",constraint=cuvieri)

## Optimizing the phylogenetic position of roosevelti using ML. Please wait....
## Done.

## compute a LR to test the alternative hypothesis that the A. roosevelti
## is *not* sister
LR.cuvieri<-2*(mltree$logL-cuvieri.tree$logL)
## perform simulation for the null distribution of the LR
set.seed(1)
LR.null<-vector()
nsim<-100
obj<-phyl.vcv(X[cuvieri.tree$tip.label,],vcv(cuvieri.tree),lambda=1)
for(i in 1:nsim){
    Xsim<-sim.corrs(tree=cuvieri.tree,vcv=obj$R,anc=obj$alpha[,1])
    mltree.null<-locate.yeti(tree,Xsim,search="exhaustive",plot=FALSE,
        quiet=TRUE)
    cuvieri.tree.null<-locate.yeti(tree,Xsim,search="exhaustive",
        constraint=which(tree$tip.label=="cuvieri"),plot=FALSE,
        quiet=TRUE)
    LR.null[i]<-2*(mltree.null$logL-cuvieri.tree.null$logL)
}
P.cuvieri<-1-mean(LR.cuvieri>=LR.null)
## P-value of the LR test
P.cuvieri

## [1] 0.26

This suggests that we cannot reject the hull hypothesis that A. rooseveltiis sister to A. cuvieri.

Now, let's do the same thing, but set our constraint tree to be one in which Anolis roosevelti must be found nested within or sister to the clade with most of Puerto Rican's remaining anole species:

sp<-c("cristatellus","cooki","poncensis","gundlachi","pulchellus","krugi",
"stratulus","evermanni")
pr<-getDescendants(tree,findMRCA(tree,sp))
pr.tree<-locate.yeti(tree,X,search="exhaustive",constraint=pr)

## Optimizing the phylogenetic position of roosevelti using ML. Please wait....
## Done.

LR.pr<-2*(mltree$logL-pr.tree$logL)
LR.null<-vector()
nsim<-100
obj<-phyl.vcv(X[pr.tree$tip.label,],vcv(pr.tree),lambda=1)
for(i in 1:nsim){
    Xsim<-sim.corrs(tree=pr.tree,vcv=obj$R,anc=obj$alpha[,1])
    mltree.null<-locate.yeti(tree,Xsim,search="exhaustive",plot=FALSE,
        quiet=TRUE)
    pr.tree.null<-locate.yeti(tree,Xsim,search="exhaustive",
        constraint=getDescendants(tree,findMRCA(tree,sp)),
        plot=FALSE,quiet=TRUE)
    LR.null[i]<-2*(mltree.null$logL-pr.tree.null$logL)
}
P.pr<-1-mean(LR.pr>=LR.null)
P.pr

## [1] 0.02

Here, by contrast, we can confidently reject the hypothesis that A. roosevelti is part of the main clade of Puerto Rican anoles.

That's pretty much it.

Last updated Jan. 13, 2015.

↧

Bug fix in plotTree.singletons to permit multifurcating nodes

February 9, 2015, 5:52 am

≫ Next: Bug fix for reroot for trees with node labels

≪ Previous: Reproducible analysis code for our forthcoming Anolis roosevelti (aka. locate.yeti) Evolution paper

A phytools user correctly reports that the phytools function plotTree.singletons (which, as the function name suggests, plots trees containing singleton nodes) breaks if more than two edges emerge from a single node.

For instance:

library(phytools)
text<-"(A:0.1,(S:0.3)B:0.2,(C:0.3,D:0.4)E:0.5)F;"
tree<-read.tree(text=text)
tree ## doesn't work at all

## 
## Phylogenetic tree with 4 tips and 3 internal nodes.
## 
## Tip labels:
## [1] "A" "S" "B" "" 
## Node labels:
## [1] ""  "D" "C"
## 
## Rooted; includes branch lengths.

tree<-read.newick(text=text)
tree ## read in correctly

## 
## Phylogenetic tree with 4 tips and 3 internal nodes.
## 
## Tip labels:
## [1] "A" "S" "C" "D"
## Node labels:
## [1] "F" "B" "E"
## 
## Unrooted; includes branch lengths.

plotTree.singletons(tree)

## Error in xy.coords(x, y): 'x' and 'y' lengths differ

plot of chunk unnamed-chunk-1

Ok, the user very helpfully identified a fix in the code to address this issue. I also seem to have found a simpler fix. The code is available hereand will also be in the next version of phytools.

Let's plot with the fixed code:

source("map.to.singleton.R")
plotTree.singletons(tree)

plot of chunk unnamed-chunk-2

OK, that's it.

↧