Phylogenetic Tools for Comparative Biology

A new phytools version (phytools 0.4-56) is now on CRAN. As of writing, Mac & Windows binaries had not yet been built. Hopefully those will be available, and will begin to percolate through the mirror repositories, within a few days.

Among the updates in this version are the following:

(1) A bug fix in plotTree.singletons to permit multifurcations.

(2) Multiple fixes to the function reroot for trees with node labels (1, 2, 3).

(3) A new interactive tree plotter (collapseTree) to collapse & expand a fan-style phylogeny (1, 2, 3).

(4) A new version of the biplot method for objects of class "phyl.pca" to permit choice in the PC axes to be plotted (here).

(5) A kindly user-supplied bug fixfor fastAnc.

(6) A fix in how the environmental variable "last_plot.phylo" is created by plotSimmap (here).

(7) Updates to a wide range of phytools plotting functions to permit smoother plotting and animation.

Finally, (8) major update to plotTree.wBars to permit tip labels to be plotted (here).

In addition to this non-comprehensive list of phytools updates, I also made many changes and additions to the phytools manual pages for the present CRAN release.

There were a few different functions described on the phytools blog that have not been added to this package update but may feature in a future version of phytools.

For example:

(1) A function to identify the youngest node on the tree with N or more descendant tips (here).

(2) A function to simulate random branch lengths under a pure-birth (Yule process) conditioned on a user-supplied tree topology (1, also see 2).

(3) A function to convert a phylogeny with node labels to a taxonomy (here).

Finally, (4) a function to split a plotted tree across multiple plotting devices or pages (1, 2).

Obviously, the latest version of phytools can be obtained from phytools.org as well as from the phytools CRAN page.

Here is a previously posted video showing use of the phytools interactive tree pruner, collapseTree:

Today I received an interesting question about the phytools function contMap. contMap, remember, is a function that maps the observed & reconstructed value of a continuous character on to the tree. The question is as follows:

“How can I recover the color codes that are used to paint the tips (or the last bit) of a phylogeny after a reconstruction with contMap? I have a character mapped on my phylogeny and I want to use the same tip colors in a scatter plot (first two components of a PCA in this case)… each species in the plot should have the same color as in the tree. I tried going through the code but I get lost at some point and can't figure out how the last bit of the terminal branches are indexed and how the color assigment is done…”

An object created by contMap is actually just the same object that is read & plotted by plotSimmap, which plot.contMap uses internally. The colors are encoded in the "contMap" element cols, and the corresponding indices for the colors mapped along each edge, in the object element tree$maps.

To demo how we can pull the colors of all mapped species from this object, see the following demo:

library(phytools)
## simulate tree & data
tree<-pbtree(n=26,tip.label=LETTERS)
X<-fastBM(tree,nsim=2)
colnames(X)<-c("trait 1","trait 2")
## contMap for the first character only
obj<-contMap(tree,X[,1])

plot of chunk unnamed-chunk-1

## function to pull out the color of a given tip
foo<-function(obj,tip){
    jj<-which(obj$tree$tip.label==tip)
    kk<-which(obj$tree$edge[,2]==jj)
    setNames(obj$cols[names(obj$tree$maps[[kk]])[length(obj$tree$maps[[kk]])]],
        NULL)
}
colors<-sapply(obj$tree$tip.label,foo,obj=obj)
colors

##           A           B           C           D           E           F 
## "#004BFFFF" "#0023FFFF" "#00FF5FFF" "#0018FFFF" "#0EFF00FF" "#10FF00FF" 
##           G           H           I           J           K           L 
## "#3000FFFF" "#0062FFFF" "#0F00FFFF" "#00A1FFFF" "#00FF64FF" "#00FF4DFF" 
##           M           N           O           P           Q           R 
## "#00FF33FF" "#00FF49FF" "#00FF79FF" "#FF7F00FF" "#FF2400FF" "#FF0300FF" 
##           S           T           U           V           W           X 
## "#00FF03FF" "#3EFF00FF" "#D9FF00FF" "#FFAE00FF" "#FF4A00FF" "#FFF500FF" 
##           Y           Z 
## "#FFA400FF" "#FF7800FF"

OK, now how about using these to plot in a scatterplot? We can do that in our two dimensional simulated space:

plot(X,col=colors,pch=19,cex=1.5)
## just to position the labels a bit more nicely
ii<-X[,1]>(max(X[,1])+min(X[,1]))/2
text(X[ii,],rownames(X)[ii],pos=2)
text(X[!ii,],rownames(X)[!ii],pos=4)

plot of chunk unnamed-chunk-2

That's all there is to it!

In the following demo I show one way to do phylogenetic generalized least squares (PGLS) assuming a Brownian motion model for the variance-covariance structure of the residual error, for conditions in which there is measurement error in the estimation of y, the dependent variable in our model, but not our xs. This was motivated by a question from a UMass-Boston graduate student based on the phylogenetic (generalized) ANOVA - so this is a case in which there might be easily quantifiable error in y but not in our factor or factors.

The way I have done this is not by writing a new corStruct - although one would perhaps ideally like to this.

First, this is the likelihood functin that we are going to optimize. It also returns the fitted model & likelihood when opt=FALSE.

lk<-function(sig2,y,X,C,v=NULL,opt=TRUE){
    n<-nrow(C)
    if(is.null(v)) v<-rep(0,n)
    V<-sig2*C+diag(v)
    beta<-solve(t(X)%*%solve(V)%*%X)%*%(t(X)%*%solve(V)%*%y)
    logL<--(1/2)*t(y-X%*%beta)%*%solve(V)%*%(y-X%*%beta)-
        (1/2)*determinant(V,logarithm=TRUE)$modulus[1]-
        (n/2)*log(2*pi)
    if(opt) -logL[1,1] else list(beta=beta[,1],sig2e=sig2,logL=logL[1,1])
}

Now, next, I start with an illustrative demo using multivariable regressin to show that this function will return the same fitted model (when no sampling error within species is assumed) as gls(...,correlation=corBrownian(...)). Here I go:

## load libraries
library(phytools)
library(nlme)
## simulate tree & data
tree<-pbtree(n=100,scale=1)
X<-fastBM(tree,nsim=2)
colnames(X)<-c("x1","x2")
y<-cbind(rep(1,Ntip(tree)),X)%*%c(1,2,3)+fastBM(tree)
## first, fit the model using gls
fit.gls<-gls(y~x1+x2,data=data.frame(y,X),correlation=corBrownian(1,tree),
    method="ML")
fit.gls

## Generalized least squares fit by maximum likelihood
##   Model: y ~ x1 + x2 
##   Data: data.frame(y, X) 
##   Log-likelihood: -47.8675
## 
## Coefficients:
## (Intercept)          x1          x2 
##   0.9532797   1.9886343   3.0334548 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 0.9198236

## now fit it using our custom function
## the interval for optimize is specified arbitrarily
fit.lk<-optimize(lk,c(0,1000),y=y,X=cbind(rep(1,Ntip(tree)),X),C=vcv(tree))
fitted<-lk(fit.lk$minimum,y=y,X=cbind(rep(1,Ntip(tree)),X),C=vcv(tree),
    opt=FALSE)
fitted

## $beta
##                  x1        x2 
## 0.9532797 1.9886343 3.0334548 
## 
## $sig2e
## [1] 0.8460792
## 
## $logL
## [1] -47.8675

So, here we can see that it works when we do not have sampling error in y. Next, let's simulate known sampling error and try again:

## these will be our within-species sampling variances
v<-setNames(rexp(n=Ntip(tree)),tree$tip.label)
ye<-setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),rownames(y))
fit.lk<-optimize(lk,c(0,1000),y=ye,X=cbind(rep(1,Ntip(tree)),X),
    C=vcv(tree),v=v)
fitted<-lk(fit.lk$minimum,y=ye,X=cbind(rep(1,Ntip(tree)),X),C=vcv(tree),
    v=v,opt=FALSE)
fitted

## $beta
##                x1       x2 
## 0.863873 2.039439 2.969356 
## 
## $sig2e
## [1] 0.6355355
## 
## $logL
## [1] -138.4668

## compare to:
gls(ye~x1+x2,data=data.frame(ye,X),correlation=corBrownian(1,tree),method="ML")

## Generalized least squares fit by maximum likelihood
##   Model: ye ~ x1 + x2 
##   Data: data.frame(ye, X) 
##   Log-likelihood: -231.8525
## 
## Coefficients:
## (Intercept)          x1          x2 
##    1.508517    1.947806    4.554398 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 5.790837

Now, it also turns out that after we have optimized sig2e we can actually coerce gls into giving us the correct fitted model & likelihood. Here, I do this by distorting the edge lengths of our tree to take into account the fitted sig2e and within-species errors in y.

tt<-tree
tt$edge.length<-tt$edge.length*fitted$sig2e
for(i in 1:length(v)){
    tip<-which(tt$tip.label==names(v)[i])
    ii<-which(tt$edge[,2]==tip)
    tt$edge.length[ii]<-tt$edge.length[ii]+v[i]
}
vv<-diag(vcv(tt))
w<-varFixed(~vv)
fit.gls<-gls(ye~x1+x2,data=data.frame(ye,X),correlation=corBrownian(1,tt),method="ML",weights=w)
fit.gls

## Generalized least squares fit by maximum likelihood
##   Model: ye ~ x1 + x2 
##   Data: data.frame(ye, X) 
##   Log-likelihood: -138.4384
## 
## Coefficients:
## (Intercept)          x1          x2 
##    0.863873    2.039439    2.969356 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~vv 
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 1.0169

## compare to:
fitted

## $beta
##                x1       x2 
## 0.863873 2.039439 2.969356 
## 
## $sig2e
## [1] 0.6355355
## 
## $logL
## [1] -138.4668

Now, here's a similar demo using the phylogenetic (generalized) ANOVA:

## evolve a factor on the tree
Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
colnames(Q)<-rownames(Q)<-letters[1:3]
x<-as.factor(sim.history(tree,Q)$states)

## Done simulation(s).

y<-fastBM(tree)
y[x=="a"]<-y[x=="a"]+1
y[x=="b"]<-y[x=="b"]+2
y[x=="c"]<-y[x=="c"]+3
fit.gls<-gls(y~x,data=data.frame(y,x),correlation=corBrownian(1,tree),method="ML")
fit.gls

## Generalized least squares fit by maximum likelihood
##   Model: y ~ x 
##   Data: data.frame(y, x) 
##   Log-likelihood: -58.60169
## 
## Coefficients:
## (Intercept)          xb          xc 
##   0.8929341   0.8188519   1.8667981 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 1.024053

summary(fit.gls)

## Generalized least squares fit by maximum likelihood
##   Model: y ~ x 
##   Data: data.frame(y, x) 
##        AIC      BIC    logLik
##   125.2034 135.6241 -58.60169
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept) 0.8929341 0.3801839  2.348690  0.0209
## xb          0.8188519 0.1411306  5.802084  0.0000
## xc          1.8667981 0.1658942 11.252944  0.0000
## 
##  Correlation: 
##    (Intr) xb    
## xb -0.181       
## xc -0.169  0.431
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -3.51162932 -1.05473759 -0.05870776  0.70967087  2.99993479 
## 
## Residual standard error: 1.024053 
## Degrees of freedom: 100 total; 97 residual

X<-model.matrix(~x)
fit.lk<-optimize(lk,c(0,1000),y=y,X=X,C=vcv(tree))
fitted<-lk(fit.lk$minimum,y=y,X=X,C=vcv(tree),opt=FALSE)
fitted

## $beta
## (Intercept)          xb          xc 
##   0.8929341   0.8188519   1.8667981 
## 
## $sig2e
## [1] 1.048674
## 
## $logL
## [1] -58.60169

## now with sampling error in y
ye<-setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),rownames(y))
## fit model
fit.lk<-optimize(lk,c(0,1000),y=ye,X=X,C=vcv(tree),v=v)
fitted<-lk(fit.lk$minimum,y=ye,X=X,C=vcv(tree),v=v,opt=FALSE)
fitted

## $beta
## (Intercept)          xb          xc 
##   0.7462831   1.2109657   1.9742749 
## 
## $sig2e
## [1] 1.338498
## 
## $logL
## [1] -136.757

## coerce tree again:
tt<-tree
tt$edge.length<-tt$edge.length*fitted$sig2e
for(i in 1:length(v)){
    tip<-which(tt$tip.label==names(v)[i])
    ii<-which(tt$edge[,2]==tip)
    tt$edge.length[ii]<-tt$edge.length[ii]+v[i]
}
vv<-diag(vcv(tt))
w<-varFixed(~vv)
fit.gls<-gls(ye~x,data=data.frame(ye,x),correlation=corBrownian(1,tt),method="ML",weights=w)
fit.gls

## Generalized least squares fit by maximum likelihood
##   Model: ye ~ x 
##   Data: data.frame(ye, x) 
##   Log-likelihood: -134.6691
## 
## Coefficients:
## (Intercept)          xb          xc 
##   0.7462831   1.2109657   1.9742749 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~vv 
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 0.8591587

summary(fit.gls)

## Generalized least squares fit by maximum likelihood
##   Model: ye ~ x 
##   Data: data.frame(ye, x) 
##        AIC     BIC    logLik
##   277.3383 287.759 -134.6691
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~vv 
## 
## Coefficients:
##                 Value Std.Error  t-value p-value
## (Intercept) 0.7462831 0.3900422 1.913339  0.0587
## xb          1.2109657 0.2408905 5.027037  0.0000
## xc          1.9742749 0.3019824 6.537715  0.0000
## 
##  Correlation: 
##    (Intr) xb    
## xb -0.295       
## xc -0.260  0.348
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -3.2076462 -0.8804235 -0.1460402  0.8224632  3.4883858 
## 
## Residual standard error: 0.8591587 
## Degrees of freedom: 100 total; 97 residual

OK, that's it.

A phytools user contacted me recently about comparing character reconstructions obtained using stochastic character mapping by the use of offset trees as in here. Although this, of course, can be done - I also have another suggestion. That is, to compare the correlation of two different densityMap style trees. A densityMaptree has the posterior density from stochastic mapping of a binary trait mapped finely on the edges and nodes of a phylogeny. For more information about this plotting method, just check out my blog.

The way we go about computing the correlation is by traversing the edges of the tree and computing the correlation as 1 if the probabilities of our character state are both 0 or both 1 on the two trees; -1 if the probability is 0 in one tree and 1 in the other; and somewhere in between for intermediate probabilities, depending on how high or low, and how similar, they are.

Here's a demo using simulated data:

## load packages
library(phytools)
## simulate a tree & two character data vectors
tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)
colnames(Q)<-rownames(Q)<-letters[1:2]
x<-sim.history(tree,Q)$states

## Done simulation(s).

##   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R 
## "b" "b" "b" "b" "b" "b" "b" "b" "a" "b" "b" "b" "a" "b" "a" "b" "b" "b" 
##   S   T   U   V   W   X   Y   Z 
## "b" "b" "b" "b" "a" "a" "b" "a"

y<-sim.history(tree,Q)$states

## Done simulation(s).

##   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R 
## "b" "b" "b" "b" "b" "a" "b" "b" "b" "b" "a" "a" "a" "a" "b" "b" "b" "a" 
##   S   T   U   V   W   X   Y   Z 
## "a" "b" "b" "b" "b" "b" "a" "a"

## perform stochastic mapping
mx<-make.simmap(tree,x,nsim=100)

## make.simmap is sampling character histories conditioned on the transition matrix
## Q =

##           a         b
## a -1.258205  1.258205
## b  1.258205 -1.258205

## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =

##   a   b 
## 0.5 0.5

## Done.

my<-make.simmap(tree,y,nsim=100)

## make.simmap is sampling character histories conditioned on the transition matrix
## Q =

##           a         b
## a -1.157145  1.157145
## b  1.157145 -1.157145

## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =

##   a   b 
## 0.5 0.5

## Done.

## compute density maps
dmapx<-densityMap(mx,plot=FALSE)

## sorry - this might take a while; please be patient

dmapy<-densityMap(my,plot=FALSE)

## sorry - this might take a while; please be patient

## compare them visually
par(mfcol=c(1,2))
plot(dmapx)
plot(dmapy,direction="leftwards")

plot of chunk unnamed-chunk-1

## now compute the correlation between maps
obj<-dmapx
nl<-length(obj$cols)-1
## here I recenter the probability on zero & 
## compute the vector correlation
for(i in 1:length(obj$tree$maps)){
    nx<-2*(as.numeric(names(dmapx$tree$maps[[i]]))-nl/2)/nl
    ny<-2*(as.numeric(names(dmapy$tree$maps[[i]]))-nl/2)/nl
    names(obj$tree$maps[[i]])<-round((nx*ny+1)/2*1000)
}
## change to an object of class "contMap"
obj$lims<-c(-1,1)
class(obj)<-"contMap"
obj<-setMap(obj,c("white","black"))
par(mfcol=c(1,1))
plot(obj)

plot of chunk unnamed-chunk-1

That's all there is to it. We see that regions of high certainty show up with high or low correlation - depending on the similarity between maps. Regions of low certainty will always show up with near zero correlation, which is precisely what we want.

That's it.

I just added a small update to the phytools function contMap, for mapping a continuous character's evolution on the tree. Now the user has control over the legend title. This was possible before (e.g., see Figure 4.4 here), just trickier and required a separate call to add.color.bar. Code for this update is here and it will be in future versions of phytools.

Here's a quick demo:

library(phytools)
data(anoletree)
X<-read.csv("http://tinyurl.com/pg7efmb",row.names=1)
rownames(X)<-paste("Anolis_",rownames(X),sep="")
svl<-setNames(X[,"SVL"],rownames(X))[anoletree$tip.label]
obj<-contMap(anoletree,svl,plot=FALSE)
source("contMap.R")
plot(obj,fsize=c(0.7,1),leg.txt="log(SVL)",lwd=3)

plot of chunk unnamed-chunk-1

That's it.

Just a quick note on this issue. The function ace for type="discrete" previously suffered from the issue that it did not (as many people nonetheless at the time believed) compute marginal ancestral states - but instead conditional probabilities based only on the information contained in the subtree descended from each node.

Thankfully, this has now been fixed - but the documentation of the function ace (as well as the unfortunate choice of argument names) still makes things much more confusing, in my opinion, then they need to be.

To summarize the main point of this post: ace(...,marginal=TRUE)does not compute marginal ancestral states - rather ace(...,marginal=FALSE)(the default) does.

In the documentation of ace, the argument marginalis defined in the following way:

“marginal a logical (relevant if type = "d"). By default, the joint reconstruction of the ancestral states are done. Set this option to TRUE if you want the marginal reconstruction (see details.)”

In details we find some more information:

“If marginal = TRUE, a marginal estimation procedure is used (this was the only choice until ape 3.1-1). With this second method, the likelihood values at a given node are computed using only the information from the tips (and branches) descending from this node. With the joint estimation, all information is used for each node.”

This gives us a hint that marginal=TRUE may not in fact be returning the marginal reconstruction - because the marginal ancestral states do not use information solely from the subtree descended from the node, rather marginal reconstruction asks (node by node) what is the most likely state for this node, integrating over all the possible states, over all the other nodes in the tree, in proportion to their probability. This differs from joint reconstruction not in using only a subset of the data (as implied) but in that joint reconstruction asks instead what is the set of states at all nodes with the highest likelihood. This set of states may or may not be the set of states obtained using marginal reconstruction (or, similarly, the set of states that individually have the highest scaled likelihoods / empirical Bayes posterior proabilities).

That ace(...,marginal=TRUE) does not give the marginal reconstruction (and, conversely, that ace(...,marginal=FALSE), the default, does give the marginal reconstructions) can be shown by comparison to other functions that do compute these reconstructions as follows:

## first simulate data & tree
tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
Q<-matrix(c(-1,1,0,1,-2,1,0,1,-1),3,3)
rownames(Q)<-colnames(Q)<-letters[1:3]
tree<-sim.history(tree,Q)

## Done simulation(s).

x<-tree$states
x

##   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R 
## "b" "a" "c" "b" "c" "c" "b" "b" "a" "a" "b" "a" "b" "b" "b" "c" "c" "b" 
##   S   T   U   V   W   X   Y   Z 
## "a" "b" "a" "a" "a" "a" "a" "a"

Start with ace for marginal=TRUE& marginal=FALSE:

fit.ace.true<-ace(x,tree,type="discrete",model="SYM",marginal=TRUE)
fit.ace.false<-ace(x,tree,type="discrete",model="SYM",marginal=FALSE)
plotTree(tree,mar=c(0.1,0.1,5.1,0.1),offset=0.5)
nodelabels(pie=fit.ace.true$lik.anc,cex=0.7)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)
title(main="ace(...,marginal=TRUE)")

plot of chunk unnamed-chunk-2

plotTree(tree,mar=c(0.1,0.1,5.1,0.1),offset=0.5)
nodelabels(pie=fit.ace.false$lik.anc,cex=0.7)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)
title(main="ace(...,marginal=FALSE)")

plot of chunk unnamed-chunk-2

plot(fit.ace.true$lik.anc,fit.ace.false$lik.anc,
    xlab="ace(...,marginal=TRUE)",ylab="ace(...,marginal=FALSE)")

plot of chunk unnamed-chunk-3

Now we can compare to marginal ancestral state reconstruction implemented in other functions. In phytools, I have the function rerootingMethod(details here) that takes advantage of the re-rooting algorithm of Yang et al.:

fit.rrm<-rerootingMethod(tree,x,model="SYM")
plotTree(tree,mar=c(0.1,0.1,5.1,0.1),offset=0.5)
nodelabels(pie=fit.rrm$marginal.anc,cex=0.7)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)
title(main="rerootingMethod")

plot of chunk unnamed-chunk-4

We can compare this to ace(...,marginal=FALSE):

plot(fit.ace.false$lik.anc,fit.rrm$marginal.anc,
    xlab="ace(...,marginal=FALSE)",ylab="rerootingMethod")

plot of chunk unnamed-chunk-5

Finally, we can use Klaus Schliep's phangorn package as follows (code snippet provided here by Klaus):

library(phangorn)
X<-phyDat(as.matrix(x),type="USER",levels=c("a","b","c"))
fit<-pml(tree,X)
fit<-optim.pml(fit,optEdge=FALSE,optRate=TRUE,optQ=TRUE)

## optimize rate matrix:  -30.36139 --> -27.84751 
## optimize rate:  -27.84751 --> -26.61064 
## optimize rate matrix:  -26.61064 --> -26.40027 
## optimize rate:  -26.40027 --> -26.32162 
## optimize rate matrix:  -26.32162 --> -26.27694 
## optimize rate:  -26.27694 --> -26.2545 
## optimize rate matrix:  -26.2545 --> -26.24114 
## optimize rate:  -26.24114 --> -26.23345 
## optimize rate matrix:  -26.23345 --> -26.2286 
## optimize rate:  -26.2286 --> -26.22557 
## optimize rate matrix:  -26.22557 --> -26.22358 
## optimize rate:  -26.22358 --> -26.22228 
## optimize rate matrix:  -26.22228 --> -26.22139 
## optimize rate:  -26.22139 --> -26.22079 
## optimize rate matrix:  -26.22079 --> -26.22038 
## optimize rate:  -26.22038 --> -26.22009 
## optimize rate matrix:  -26.22009 --> -26.21989 
## optimize rate:  -26.21989 --> -26.21975 
## optimize rate matrix:  -26.21975 --> -26.21965 
## optimize rate:  -26.21965 --> -26.21957

fit.phangorn<-ancestral.pml(fit)
anc.ml<-t(sapply(1:tree$Nnode+Ntip(tree),function(i,x) x[[as.character(i)]][1,],x=fit.phangorn))
plotTree(tree,mar=c(0.1,0.1,5.1,0.1),offset=0.5)
nodelabels(pie=anc.ml,cex=0.7)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)
title(main="phangorn::ancestral.pml")

plot of chunk unnamed-chunk-6

plot(fit.ace.false$lik.anc,anc.ml,xlab="ace(...,marginal=FALSE)",
    ylab="phangorn::ancestral.pml")

plot of chunk unnamed-chunk-7

OK, well I guess that the point of this lengthy exercise is not to criticize ace, or ape (which is an incredible, impressive, multifunctional package at the core of most phylogenetics in R), but merely to point out that ace may not be doing exactly what you think it's doing - when you'd think it would be reasonable for it to be doing that thing. The good news is that with the default settings in recent versions of ape the reconstructed ancestral states returned are marginal ancestral states (aka. empirical Bayesian posterior probabilities), which is probably what we should be using for ancestral state reconstruction of discrete traits on the tree.

In developing an exercise recently, I discovered a small bug that was introduced into the function phenogramwhen I changed (in the latest phytools) version the optional argument spread.labels to default from FALSE to TRUE. This bug causes the function to crash if labels are turned off using ftype="off" under the default conditions.

Here's how it manifests using simulated tree & data:

library(phytools)
packageVersion("phytools")

## [1] '0.4.56'

tree<-pbtree(n=26,tip.label=LETTERS)
x<-fastBM(tree)
phenogram(tree,x) ## works

plot of chunk unnamed-chunk-1

phenogram(tree,x,ftype="off") ## doesn't work

## Error in optim(zz, ff, yy = yy, mo = mo, ms = ms, cost = cost, method = "L-BFGS-B", : L-BFGS-B needs finite values of 'fn'

plot of chunk unnamed-chunk-1

phenogram(tree,x,ftype="off",spread.labels=FALSE) ## works

plot of chunk unnamed-chunk-1

The fix is pretty easy - I just check if ftype="off" and set spread.labels=FALSE if it is. I will put this in the next version of phytools. Here's how it works:

source("http://www.phytools.org/phenogram/v1.4/phenogram.R")
phenogram(tree,x)

plot of chunk unnamed-chunk-2

phenogram(tree,x,ftype="off") ## now works

plot of chunk unnamed-chunk-2

OK, that's it for now.

A R-sig-phylo participant recently askedhow one goes about using the phytools function map.overlapto compute the set of similarities between stochastic mappings for a set of trees. map.overlap calculates the summed fraction edge lengths between two trees that are the same. She also wanted to do this if the topologies of different trees from stochastic mapping differed, so long as the 1st tree in set one, and the 1st tree in set two matched, along with the 2nd with the 2nd, and so on.

Well, this is not too challenging. If we have, say, N stochastic mappings on a tree for one character, and N stochastic mappings for a second, and so long as the two characters are coded in the same way (for instance, 0 / 1, A / B, 1 / 2 / 3, etc.), then it is pretty easy to compute the N× N matrix of map overlaps between the two sets:

## first let's simulate a tree & two a/b characters to use in 
## this demo
library(phytools)
tree<-pbtree(n=100,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-sim.history(tree,Q)$states

## Done simulation(s).

x ## character 1

##   t4  t65  t87  t88  t55  t56  t33  t45  t83  t84  t37  t43  t44  t41  t42 
##  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b" 
##  t48  t93  t94  t81  t82   t1  t64  t91  t92  t66  t15  t16  t29  t30  t26 
##  "b"  "b"  "b"  "a"  "a"  "b"  "b"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "a" 
##  t79  t80  t21   t5   t6  t34  t35  t10  t11  t95  t96  t19  t28  t31  t32 
##  "a"  "a"  "b"  "b"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "b" 
##  t23  t24  t20  t75  t76  t36   t9  t69  t70  t73  t74  t46  t27  t77  t78 
##  "b"  "a"  "a"  "a"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "b" 
##  t71  t72  t39  t40  t99 t100   t7  t25  t85  t86  t57  t58  t22  t17  t97 
##  "b"  "b"  "a"  "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "a"  "a" 
##  t98  t50  t49   t3  t18  t89  t90  t38  t14   t2  t47  t62  t63  t59  t51 
##  "a"  "a"  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "a" 
##  t52  t60  t61  t54   t8  t12  t13  t53  t67  t68 
##  "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "b"

y<-sim.history(tree,Q)$states

## Done simulation(s).

y ## character 2

##   t4  t65  t87  t88  t55  t56  t33  t45  t83  t84  t37  t43  t44  t41  t42 
##  "a"  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "b"  "b" 
##  t48  t93  t94  t81  t82   t1  t64  t91  t92  t66  t15  t16  t29  t30  t26 
##  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "b" 
##  t79  t80  t21   t5   t6  t34  t35  t10  t11  t95  t96  t19  t28  t31  t32 
##  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "b" 
##  t23  t24  t20  t75  t76  t36   t9  t69  t70  t73  t74  t46  t27  t77  t78 
##  "b"  "b"  "b"  "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b" 
##  t71  t72  t39  t40  t99 t100   t7  t25  t85  t86  t57  t58  t22  t17  t97 
##  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b" 
##  t98  t50  t49   t3  t18  t89  t90  t38  t14   t2  t47  t62  t63  t59  t51 
##  "b"  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a" 
##  t52  t60  t61  t54   t8  t12  t13  t53  t67  t68 
##  "a"  "a"  "a"  "a"  "a"  "a"  "b"  "b"  "b"  "b"

## next we can generate a set of 100 stochastic maps for each
## character
mx<-make.simmap(tree,x,nsim=100)

## make.simmap is sampling character histories conditioned on the transition matrix
## Q =

##           a         b
## a -1.070102  1.070102
## b  1.070102 -1.070102

## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =

##   a   b 
## 0.5 0.5

## Done.

my<-make.simmap(tree,y,nsim=100)

## make.simmap is sampling character histories conditioned on the transition matrix
## Q =

##            a          b
## a -0.7767663  0.7767663
## b  0.7767663 -0.7767663

## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =

##   a   b 
## 0.5 0.5

## Done.

## compute the matrix of overlaps
overlap<-sapply(mx,function(x,y) sapply(y,map.overlap,x),y=my)
dim(overlap)

## [1] 100 100

mean(overlap)

## [1] 0.495982

hist(overlap,xlim=c(0,1),col="grey")
lines(c(mean(overlap),mean(overlap)),c(0,par()$usr[4]),lty="dashed",
    col="red",lwd=2)

plot of chunk unnamed-chunk-1

Note that this matrix is assymetric as overlap[i,j] gives the map similarity between mx[[i]] and my[[j]], in this case.

Alternatively, if we want to compute the vector of pairwise comparison, 1 vs. 1, 2 vs. 2, 3 vs. 3, and so on, we can do that using mapply:

overlap<-mapply(map.overlap,mx,my)
length(overlap)

## [1] 100

mean(overlap)

## [1] 0.4902131

Now, a separate question might be what the stochastic mapping similarities of two different characters might tell us about a correlation, or lack thereof, in their evolution process. To examine this, I thought I'd use some code that I posted earlierto simulate correlated evolution of two binary traits:

## values of Q to on average produce the same rates
## in x & y as before
Q<-matrix(c(0,0.8,0.8,0,3.2,0,0,3.2,3.2,0,0,3.2,0,0.8,0.8,0),4,4,byrow=TRUE)
rownames(Q)<-colnames(Q)<-c("aa","ab","ba","bb")
diag(Q)<--rowSums(Q)
tt<-sim.history(tree,t(Q))

## Note - the rate of substitution from i->j should be given by Q[j,i].
## Done simulation(s).

tt1<-mergeMappedStates(tt,c("aa","ab"),"a")
tt1<-mergeMappedStates(tt1,c("ba","bb"),"b")
tt2<-mergeMappedStates(tt,c("aa","ba"),"a")
tt2<-mergeMappedStates(tt2,c("ab","bb"),"b")
## these data are correlated, see:
par(mfrow=c(1,2))
plotSimmap(tt1,setNames(c("blue","red"),letters[1:2]),ftype="off",lwd=1)
plotSimmap(tt2,setNames(c("blue","red"),letters[1:2]),ftype="off",lwd=1,
    direction="leftwards")

plot of chunk unnamed-chunk-3

## pull the tip states out of our simulated histories
x<-getStates(tt1,"tips")
y<-getStates(tt2,"tips")

Now let's do stochastic mapping for both characters and then compute the mean & distribution of overlap in the stochastic. Obviously, as the characters are correlated, we might expect that they show a higher degree of overlap in the stochastic maps.

mx<-make.simmap(tree,x,nsim=100)

## make.simmap is sampling character histories conditioned on the transition matrix
## Q =

##            a          b
## a -0.8040389  0.8040389
## b  0.8040389 -0.8040389

## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =

##   a   b 
## 0.5 0.5

## Done.

my<-make.simmap(tree,y,nsim=100)

## make.simmap is sampling character histories conditioned on the transition matrix
## Q =

##            a          b
## a -0.9141153  0.9141153
## b  0.9141153 -0.9141153

## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =

##   a   b 
## 0.5 0.5

## Done.

overlap<-sapply(mx,function(x,y) sapply(y,map.overlap,x),y=my)
mean(overlap)

## [1] 0.7793841

hist(overlap,xlim=c(0,1),col="grey")
lines(c(mean(overlap),mean(overlap)),c(0,par()$usr[4]),lty="dashed",
    col="red",lwd=2)

plot of chunk unnamed-chunk-4

Interesting.

Now, does this tell us anything that Pagel's (1994) method (implemented now in the phytools function fitPagel wouldn't have - I don't know, but it is kind of neat to see in any case.

That's all for now.

phytools has a function, described in a paper by myself & Alexis Harrison, that does phylogenetic canonical correlation analysis in a phylogenetic context.

It is possible to do the same analysis, that is returning the same correlations & proportional coefficients, if we just first compute Felsenstein's contrasts, and then perform an uncentered canonical correlationa analysis on the contrasts. Here is a quick demo of how we go about doing this for data contained in matrices X and Y, and phylogeny tree.

library(phytools)

cca1<-cancor(apply(X,2,pic,phy=tree),apply(Y,2,pic,phy=tree),
    xcenter=FALSE,ycenter=FALSE)
cca1$cor

##  [1] 0.9496915 0.9099798 0.8602210 0.8278812 0.8035111 0.7906251 0.7601731
##  [8] 0.7348214 0.6872831 0.5698925 0.5605435 0.4835210 0.4116497 0.2441289

cca2<-phyl.cca(tree,X,Y)
cca2$cor

##  [1] 0.9496915 0.9099798 0.8602210 0.8278812 0.8035111 0.7906251 0.7601731
##  [8] 0.7348214 0.6872831 0.5698925 0.5605435 0.4835210 0.4116497 0.2441289

Just to verify that the correlations are the same and that the coefficients are proportional, let's quickly plot them:

## this is 1:1
plot(cca1$cor,cca2$cor)

plot of chunk unnamed-chunk-2

## this is 1:1 or 1:-1, because the sign of the coefficients for
## any canonical axis is arbitrary
## (in addition, be aware that the scale between phylogenetic &
## non-phylogenetic analyses will probably differ) 
plot(cca1$xcoef,cca2$xcoef)

plot of chunk unnamed-chunk-2

The main advantage of phyl.cca is that it also returns scores in the original space, and it automatically runs hypothesis tests on the canonical correlations. There are other canonical correlation functions to do this in R, but they do not permit the data to be treated as centered, which means that they cannot be used with contrasts.

This post is based on a user query about how to do this, by the way.

That's all there is to it.

I'm just in the middle of fixing a bug in the phytools function for lineage-through-time plots, ltt, pointed out to me by Luke Harmon. It has to do with node labels & is not that interesting (but I will nonetheless post the fixed version soon). However, in the process of playing around with that function, a discovered a find of neat visualization that involves overlaying a phylogeny on an LTT plot.

Here is a demo:

library(phytools)
## simulate tree
tree<-pbtree(n=26,scale=10) ## arbitrary

The next part uses a function to automatically generate from a starting color a color with a certain alpha level of transparency. This function was posted to R-sig-phylo by Josef Uyeda:

makeTransparent<-function(someColor,alpha=10){
    newColor<-col2rgb(someColor)
    apply(newColor,2,function(curcoldata){
    rgb(red=curcoldata[1],green=curcoldata[2],blue=curcoldata[3],
        alpha=alpha,maxColorValue=255)
    })
}

Now we're ready to create our visualization:

obj<-ltt(tree)
plotTree(tree,color=makeTransparent("blue",alpha=50),ftype="off",add=TRUE,
    mar=par()$mar)

plot of chunk unnamed-chunk-3

We might even add other features - like vertical lines so we can see where lineages that have accumulated appear. This works well particularly for small trees:

tree<-pbtree(n=15,scale=10)
obj<-ltt(tree)
for(i in 3:(length(obj$ltt)-1)) 
    lines(rep(obj$times[i],2),par()$usr[3:4],lty="dashed",
    col=makeTransparent("black",alpha=50))
plotTree(tree,color=makeTransparent("blue",alpha=50),ftype="off",add=TRUE,
    mar=par()$mar)

plot of chunk unnamed-chunk-4

That's all.

Today, Luke Harmon & Josef Uyeda reported a bug in the phytools function ltt, which does lineage-through-time plots. The difference between ltt and equivalent functions in other packages is that ltt permits trees to include lineages that end prior to the end of the tree (for instance, a tree with some lineages that have gone extinct). For example:

library(phytools)
tree<-pbtree(n=100,b=1.6,d=0.6)
plotTree(tree,ftype="off")

plot of chunk unnamed-chunk-1

obj<-ltt(tree)

plot of chunk unnamed-chunk-2

The reported bug is for conditions when the tree includes node labels. The source of this issue is that for trees in which all tips are contemporaneous (i.e., ultrametric trees) ltt uses the ape function branching.times internally and assumes that the names of the vector returned by branching.times are the node indices from the "phylo" object. They are if the tree lacks node labels; however if node labels are present, then they will instead contain the node labels of the tree. This creates an error when ltt looks for the node indices & doesn't find them! Here's a demo:

tree<-pbtree(n=26,tip.label=LETTERS,scale=10)
plotTree(tree)

plot of chunk unnamed-chunk-3

## no node labels
ltt(tree)

plot of chunk unnamed-chunk-4

## $ltt
##    27 31 39 32 28 40 33 48 35 45 49 37 51 41 46 43 50 47 29 38 44 30 34 36 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
## 42    
## 26 26 
## 
## $times
##                        27           31           39           32 
## 0.000000e+00 9.999983e-10 1.594837e+00 1.613968e+00 1.809402e+00 
##           28           40           33           48           35 
## 2.384807e+00 4.478021e+00 5.274344e+00 5.812117e+00 6.371105e+00 
##           45           49           37           51           41 
## 6.434691e+00 7.038966e+00 7.336978e+00 7.372788e+00 7.762271e+00 
##           46           43           50           47           29 
## 8.524592e+00 8.967709e+00 9.034180e+00 9.082955e+00 9.457377e+00 
##           38           44           30           34           36 
## 9.480731e+00 9.539301e+00 9.748171e+00 9.763473e+00 9.872351e+00 
##           42              
## 9.946383e+00 1.000000e+01 
## 
## $gamma
## [1] 0.8561059
## 
## $p
## [1] 0.3919392

## add node labels
tree$node.label<-letters[1:25]
plotTree(tree)
nodelabels(tree$node.label)

plot of chunk unnamed-chunk-5

ltt(tree)

## Warning in ltt(tree): NAs introduced by coercion

## Error in 2:n: NA/NaN argument

plot of chunk unnamed-chunk-6

My bug fix, which is a little hacky to be fair, I first copy the node labels from the tree into a vector with names corresponding to the node indices, then I strip them, then I replace the names in the object returned by ltt with the original labels. The code is here. The following is a demo in which I load the function from source:

source("http://www.phytools.org/ltt/v1.0/ltt.R")
ltt(tree)

plot of chunk unnamed-chunk-7

## $ltt
##     a  e  m  f  b  n  g  v  i  s  w  k  y  o  t  q  x  u  c  l  r  d  h  j 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
##  p    
## 26 26 
## 
## $times
##                         a            e            m            f 
## 0.000000e+00 9.999983e-10 1.594837e+00 1.613968e+00 1.809402e+00 
##            b            n            g            v            i 
## 2.384807e+00 4.478021e+00 5.274344e+00 5.812117e+00 6.371105e+00 
##            s            w            k            y            o 
## 6.434691e+00 7.038966e+00 7.336978e+00 7.372788e+00 7.762271e+00 
##            t            q            x            u            c 
## 8.524592e+00 8.967709e+00 9.034180e+00 9.082955e+00 9.457377e+00 
##            l            r            d            h            j 
## 9.480731e+00 9.539301e+00 9.748171e+00 9.763473e+00 9.872351e+00 
##            p              
## 9.946383e+00 1.000000e+01 
## 
## $gamma
## [1] 0.8561059
## 
## $p
## [1] 0.3919392

This update should be in the next version of phytools.

That's it.

Here's a simple trick to rescale a tree to have a particular meanheight (if not ultrametric):

library(phytools)
## in this case we will use a random tree
tree<-rtree(n=26,tip.label=LETTERS)
plotTree(tree,mar=c(5.1,0.1,0.1,0.1))
axis(1)
## compute all tip heights
h<-sapply(1:Ntip(tree),nodeheight,tree=tree)
h

##  [1] 2.998386 3.391986 3.502868 3.567715 3.110754 3.365106 2.714611
##  [8] 1.358671 2.152025 2.313791 2.715758 3.055251 3.199052 1.739875
## [15] 3.869145 4.121823 3.534816 3.962804 1.561411 2.536063 3.378451
## [22] 2.719116 2.628788 2.661650 2.900916 2.311506

mean(h)

## [1] 2.898936

lines(c(mean(h),mean(h)),par()$usr[3:4],lty="dashed",col="red")

plot of chunk unnamed-chunk-1

## decide on new desired mean height
new.h<-100
## rescale tree
tree$edge.length<-tree$edge.length/mean(h)*new.h
h<-sapply(1:Ntip(tree),nodeheight,tree=tree)
h

##  [1] 103.43058 117.00797 120.83289 123.06981 107.30675 116.08072  93.64164
##  [8]  46.86791  74.23498  79.81517  93.68118 105.39215 110.35264  60.01771
## [15] 133.46776 142.18398 121.93495 136.69855  53.86151  87.48253 116.54105
## [22]  93.79704  90.68114  91.81473 100.06830  79.73635

mean(h)

## [1] 100

plotTree(tree,mar=c(5.1,0.1,0.1,0.1))
axis(1)
lines(c(mean(h),mean(h)),par()$usr[3:4],lty="dashed",col="red")

plot of chunk unnamed-chunk-1

That's it.

In the past I've described a “parametric bootstrapping” approach for testing the null hypothesis that Blomberg's K is significantly different from 1.0 (for instance here). 1.0 is the expected value of Blomberg's K under a Brownian motion model of evolutionary change for our character. Here, I'll describe the relatively simple modification of that procedure that can be used to conduct the same hypothesis test under conditions in which we also have sampling error in the estimation of species means for our character.

Let's start by simulating some trait data with sampling error. We can do this using the phytools functions pbtree (for phylogeny simulation), fastBM (to simulate trait data), and sampleFrom, which is a simple function, normally used internally, which allows us to sample trait data under some model (for instance, with error).

library(phytools)
## simulate tree
tree<-pbtree(n=100,scale=1)
## simulate trait data
x<-fastBM(tree,sig2=1)
## add error
ve<-setNames(rchisq(n=Ntip(tree),df=1)/runif(n=Ntip(tree),min=1,max=5),
    tree$tip.label)
xe<-sampleFrom(x,ve,n=rep(1,Ntip(tree)))
plot(x,xe,xlab="true species means",
    ylab="simulated species means with sampling error")

plot of chunk unnamed-chunk-1

Now let's fit our model & generate data under the null hypothesis of K = 1.0. This is where we would normally begin with a genuine empirical dataset.

fit<-phylosig(tree,xe,se=sqrt(ve))
fit

## $K
## [1] 0.9085662
## 
## $sig2
## [1] 1.125822
## 
## $logL
## [1] -112.5553

nullX<-fastBM(tree,nsim=200,sig2=fit$sig2)
nullXe<-apply(nullX,2,sampleFrom,xvar=ve,n=rep(1,Ntip(tree)))
obj<-apply(nullXe,2,phylosig,tree=tree,se=sqrt(ve))
nullK<-sapply(obj,function(x) x[[1]])
hist(nullK,breaks=20,xlab="null distribution for K",
    main="Null distribution of K")
lines(c(fit$K,fit$K),c(0,par()$usr[4]),lty="dashed",col="red")
text(x=fit$K,y=0.985*par()$usr[4],"observed value of K",
    pos=4,offset=0.2)

plot of chunk unnamed-chunk-2

Obviously, here we are well within the null distribution for K, but we can also attach a P-value to this observation. I usually use the logarithm of K, because this equally penalizes K = 0.5 and K = 2.0.

P<-mean(abs(log(nullK))>=abs(log(fit$K))) ## two-tailed test
P

## [1] 0.675

That's it.

I'm in the midst of working on a design for the t-shirts for this year's macroevolution workshop in Ilhabela, Brazil. I've created a tentative 'front' design entirely in R using the function of phytools, as follows:

library(phytools)
## don't run
# tree<-pbtree(n=100)
# x<-fastBM(tree)

obj<-contMap(tree,x,plot=FALSE)
obj<-setMap(obj,colors=c("blue","purple","red"))
layout(mat=matrix(c(1,2),2,1),heights=c(0.8,0.2))
par(bg="black")
par(fg="white")
plotSimmap(paintSubTree(tree,Ntip(tree)+1,"1"),type="fan",
    ftype="off",colors=setNames("white","1"),lwd=6,part=0.5)

## setEnv=TRUE for this type is experimental. please be patient with bugs

plotSimmap(obj$tree,type="fan",ftype="off",colors=obj$cols,lwd=4,
    add=TRUE,part=0.5)

## setEnv=TRUE for this type is experimental. please be patient with bugs

plot.new()
text(0.5,0.5,"Latin American Macroevolution Workshop\nIlhabela Brazil 2015",
    col="white",cex=2.1,font=2)

plot of chunk unnamed-chunk-2

The knitr output is kind of aliased. We can create a higher quality PDF easily:

pdf(file="t-shirt.pdf",width=8,height=4.75)
layout(mat=matrix(c(1,2),2,1),heights=c(0.8,0.2))
par(bg="black")
par(fg="white")
plotSimmap(paintSubTree(tree,Ntip(tree)+1,"1"),type="fan",
    ftype="off",colors=setNames("white","1"),lwd=6,part=0.5)

## setEnv=TRUE for this type is experimental. please be patient with bugs

plotSimmap(obj$tree,type="fan",ftype="off",colors=obj$cols,lwd=4,
    add=TRUE,part=0.5)

## setEnv=TRUE for this type is experimental. please be patient with bugs

plot.new()
text(0.5,0.5,"Latin American Macroevolution Workshop\nIlhabela Brazil 2015",
    col="white",cex=2.1,font=2)
dev.off()

## windows 
##       2

Here is a linkto the file. (It looks much better!)

That's it for now.

Yesterday a phytools user emailed me the following inquiry:

“Is there an easy way in phytools to infer the times while there are exactly k ancestral lineages in the tree? I would need to calculate these based on a newick tree input. However, I havent found a function in phytools for it. Somthing like k=2 from x to y k=3 from z to v”

This can be done pretty easily using the phytools function ltt. Of course, one might allow for the possilibity that - if lineages both increase & decrease through time, there could be more than one period of time during which there is 2 lineages, 3 lineages etc.

The following gives a quick demo - first using a tree in which the number of lineages both increases & descreases in time since the root:

library(phytools)
tree<-rtree(26,tip.label=LETTERS)
plotTree(tree)

plot of chunk unnamed-chunk-1

## create an object with the number of lineages through time:
obj<-ltt(tree,log=FALSE)

plot of chunk unnamed-chunk-2

Now let's go through all the number of possible lineages and tabulate all the time intervals in which the tree has each number of lineages:

foo<-function(n,obj){
    ii<-which(obj$ltt==n)
    ii<-ii[ii!=length(obj$ltt)]
    sapply(ii,function(i,x) c(x[i],x[i+1]),x=obj$times)
}
Nlineages<-sapply(1:max(obj$ltt),foo,obj=obj)
if(is.matrix(Nlineages)){
    colnames(Nlineages)<-1:max(obj$ltt)
    rownames(Nlineages)<-c("start","end")
} else if(is.list(Nlineages)){
    names(Nlineages)<-1:max(obj$ltt)
    for(i in 1:length(Nlineages)) 
        rownames(Nlineages[[i]])<-c("start","end")
}

Our object is a list, with a matrix for each number of lineages showing the time intervals spent with that number of lineages.

Nlineages

## $`1`
##               16
## start 0 4.128545
## end   0 4.198241
## 
## $`2`
##               27       15
## start 0.00000000 4.049935
## end   0.07445996 4.128545
## 
## $`3`
##               28       13       11
## start 0.07445996 1.501722 3.962453
## end   0.15032225 1.711749 4.049935
## 
## $`4`
##              30         1         4        2       41       22
## start 0.1503222 0.3991084 0.5425447 1.499231 1.711749 3.655242
## end   0.2915629 0.4658409 0.8433361 1.501722 2.151097 3.962453
## 
## $`5`
##              31        50        29       24       35       21
## start 0.2915629 0.4658409 0.8433361 1.464400 2.151097 3.651133
## end   0.3991084 0.5425447 0.8714423 1.499231 2.194213 3.655242
## 
## $`6`
##              51        3       37        5        9       18
## start 0.8714423 1.458014 2.194213 2.331990 3.612215 3.642776
## end   0.9738692 1.464400 2.274517 2.353893 3.618789 3.651133
## 
## $`7`
##              32       26       25       45       42       14       12
## start 0.9738692 1.096267 1.325357 2.274517 2.353893 2.607869 3.509315
## end   1.0088520 1.263415 1.458014 2.331990 2.573532 2.785814 3.612215
##             44
## start 3.618789
## end   3.642776
## 
## $`8`
##             33       34       46       38        7
## start 1.008852 1.263415 2.573532 2.785814 3.484318
## end   1.096267 1.325357 2.607869 2.810022 3.509315
## 
## $`9`
##             47       20        6
## start 2.810022 3.318845 3.382049
## end   2.953015 3.380557 3.484318
## 
## $`10`
##             39       23       19        8       40
## start 2.953015 3.103372 3.199463 3.277445 3.380557
## end   2.988247 3.114297 3.252606 3.318845 3.382049
## 
## $`11`
##             43       36       17       49
## start 2.988247 3.114297 3.179681 3.252606
## end   3.103372 3.149976 3.199463 3.277445
## 
## $`12`
##             48
## start 3.149976
## end   3.179681

Next, we can try and ultrametric trees with all lineages terminating in the present:

tree<-pbtree(n=26,tip.label=LETTERS)
plotTree(tree)

plot of chunk unnamed-chunk-5

obj<-ltt(tree,log=FALSE)

plot of chunk unnamed-chunk-6

foo<-function(n,obj){
    ii<-which(obj$ltt==n)
    ii<-ii[ii!=length(obj$ltt)]
    sapply(ii,function(i,x) c(x[i],x[i+1]),x=obj$times)
}
Nlineages<-sapply(1:max(obj$ltt),foo,obj=obj)
if(is.matrix(Nlineages)){
    colnames(Nlineages)<-1:max(obj$ltt)
    rownames(Nlineages)<-c("start","end")
} else if(is.list(Nlineages)){
    names(Nlineages)<-1:max(obj$ltt)
    for(i in 1:length(Nlineages)) 
        rownames(Nlineages[[i]])<-c("start","end")
}

This time we have a matrix in which each column is a number of lineages and the rows show the start and end times. When no lineages are lost through time, there will be only one time interval for each number of lineages:

Nlineages

##           1            2         3         4        5        6        7
## start 0e+00 2.000000e-11 0.4418836 0.5968292 1.356948 1.752875 1.756543
## end   2e-11 4.418836e-01 0.5968292 1.3569481 1.752875 1.756543 1.876857
##              8        9       10       11       12       13       14
## start 1.876857 1.931771 2.000830 2.022528 2.046486 2.218467 2.283606
## end   1.931771 2.000830 2.022528 2.046486 2.218467 2.283606 2.445454
##             15       16      17       18       19       20       21
## start 2.445454 2.506024 2.57934 2.586070 2.596797 2.668859 2.671522
## end   2.506024 2.579340 2.58607 2.596797 2.668859 2.671522 2.735799
##             22       23       24       25       26
## start 2.735799 2.773744 2.779681 2.823757 2.865986
## end   2.773744 2.779681 2.823757 2.865986 2.866205

Note that the tiny zeroeth timeslice with one lineage at the start of each tree is an idiosyncratic feature of phytools::lttand could easily be ignored.

That's it.

A phytools user recently reported that the ancestral character estimation method for discrete traits in phytools, rerootingMethod, does not work for trees with polytomies. This is in fact correct - although I had long forgotten that this was the case.

The function rerootingMethod uses the approach of Yang et al. (1995) to compute the marginal ancestral states by re-rooting the tree at all internal nodes. For instances in which some states at the tips are uncertain, it can also re-root at the tip to compute empirical Bayesian posterior probabilities for the leaves as well. The function is less relevant now that ace in the ape package also can compute the marginal ancestral states. (Although, as far as I know, ace can also not handle polytomies.) rerootingMethod stays somewhat relevant only by virtue of being able to compute posterior probabilities at tips, and for handling uncertainty at tip states (two sides of the same coin, I guess).

It was fairly straightforward to update rerootingMethod to handle polytomies, and the code is posted here. The way that this current version works is pretty simple. It just first takes the input tree, checks for polytomies using is.binary.tree in the ape package, resolves polytomies randomly with branches of zero length using multi2di, estimates ancestral states for all internal nodes (& tips, if tips=TRUE), and the uses the phytools function matchNodesto back-translate the reconstructed nodes in the resolved tree to the original, input tree. Note that although it is doing something here that ace does not do - the code internally uses (modified code from) ace.

Here - we can check it out by simulating a tree with some polytomies, generating trait data for tips & nodes, and then reconstructing ancestral states on the tree:

library(phytools)
dt<-rtree(n=26)
dt$tip.label<-LETTERS
## set some of the internal branches
ii<-which(dt$edge[,2]>Ntip(dt))
dt$edge.length[sample(ii,4)]<-0
mt<-di2multi(dt)
plotTree(mt)

plot of chunk unnamed-chunk-1

Next let's simulate a character up the tree:

Q<-matrix(c(-1,0.5,0.5,
    0.5,-1,0.5,0.5,0.5,-1),3,3,
    dimnames=list(letters[1:3],letters[1:3]))
st<-sim.history(mt,Q)

## Done simulation(s).

x<-st$states
y<-getStates(st,"nodes")

Now, we can reconstruct & plot ancestral states using ace on the bifurcating tree; and rerootingMethod on the bi- and on the multifurcating trees. If the methods are working properly then the reconstructions will be the same - although in the bifurcating node there will be additional nodes with states identical to other nodes from which they are separated by branches of zero length.

## first ace
fit1<-ace(x,mt,type="discrete",model="ER") ## doesn't work

## Error in ace(x, mt, type = "discrete", model = "ER"): "phy" is not rooted AND fully dichotomous.

fit2<-ace(x,dt,type="discrete",model="ER") ## doesn't work

## Error in ace(x, dt, type = "discrete", model = "ER"): some branches have length zero or negative

## set zero-length branches to be 1/1000000 total tree length
dst<-dt
dst$edge.length[dst$edge.length==0]<-max(nodeHeights(dt))*1e-6
fit3<-ace(x,dst,type="discrete",model="ER")
fit3

## 
##     Ancestral Character Estimation
## 
## Call: ace(x = x, phy = dst, type = "discrete", model = "ER")
## 
##     Log-likelihood: -25.3342 
## 
## Rate index matrix:
##   a b c
## a . 1 1
## b 1 . 1
## c 1 1 .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.3278  0.1042
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##          a          b          c 
## 0.82837257 0.09781827 0.07380916

Now with the new version of rerootingMethod:

packageVersion("phytools")

## [1] '0.4.57'

fit4<-rerootingMethod(mt,x,model="ER")
fit4

## $loglik
## [1] -25.33421
## 
## $Q
##            a          b          c
## a -0.6556595  0.3278298  0.3278298
## b  0.3278298 -0.6556595  0.3278298
## c  0.3278298  0.3278298 -0.6556595
## 
## $marginal.anc
##               a          b            c
## 27 0.8283833959 0.09781145 0.0738051507
## 28 0.2121485528 0.69489845 0.0929529991
## 29 0.0532939602 0.91028552 0.0364205221
## 30 0.0627113681 0.88952034 0.0477682947
## 31 0.0132238730 0.97569370 0.0110824277
## 32 0.3870898035 0.30541903 0.3074911711
## 33 0.4035347684 0.15548939 0.4409758369
## 34 0.5865138808 0.07266794 0.3408181834
## 35 0.3530969369 0.06523786 0.5816652004
## 36 0.7045263050 0.13922658 0.1562471186
## 37 0.1288634178 0.06087131 0.8102652730
## 38 0.3401006710 0.49952777 0.1603715570
## 39 0.4129576403 0.32707964 0.2599627226
## 40 0.1266883464 0.35550347 0.5178081798
## 41 0.0553700550 0.21371716 0.7309127832
## 42 0.8088376946 0.10680265 0.0843596595
## 43 0.8570510600 0.08945954 0.0534894017
## 44 0.6274100032 0.30199393 0.0705960682
## 45 0.0836944308 0.86315981 0.0531457619
## 46 0.0119373448 0.97893201 0.0091306462
## 47 0.0006664523 0.99874487 0.0005886784

If we plot both of these on our trees in turn, we should see that the marginal reconstructions (excepting the small deviation that was required to give our tree non-zero branch lengths throughout) should be equal:

## first ace
plotTree(dt,offset=0.5)
nodelabels(pie=fit3$lik.anc)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)

plot of chunk unnamed-chunk-5

## now phytools
plotTree(mt,offset=0.5)
nodelabels(pie=fit4$marginal.anc)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)

plot of chunk unnamed-chunk-5

Finally, for fun let's overlay the true, known states to see how close (or off) we were:

plotSimmap(st,colors=setNames(c("red","green","blue"),letters[1:3]),
    offset=0.5)
nodelabels(pie=fit4$marginal.anc)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)

plot of chunk unnamed-chunk-6

That's it.

I just updated the phytools function ltt, which computes and plots a lineage-through-time plot, even for trees that contain lineages terminating before the present day (i.e., extinct lineages).

The main changes in this version is that I created new object classes, "ltt" (for a single lineage-through-time plot) or "multiLtt" (for two or more trees supplied as an object of class "multiPhylo", and then created S3 print and plot methods for these object classes.

From a practical standpoint, the main functionality added is that the arguments of plot.default can now be used to customize the LTT visualization. Here is a quick demo of the new methods and the new functionality of ltt:

library(phytools)

## Loading required package: ape
## Loading required package: maps

packageVersion("phytools")

## [1] '0.4.58'

## basic ltt plotting
tree<-pbtree(n=26,tip.label=LETTERS,scale=10)
obj<-ltt(tree,plot=FALSE)
## S3 print method
obj

## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
## 
## (3) A value for Pybus & Harvey's "gamma" statistic of -1.1127, p-value = 0.2658.

## S3 plot method
plot(obj,lwd=2,log.lineages=FALSE,log="y",
    main="lineage through time plot")

plot of chunk unnamed-chunk-1

## tree with extinction
tree<-pbtree(n=100,b=1,d=0.4,t=4)

## simulating with both taxa-stop (n) and time-stop (t) is
## performed via rejection sampling & may be slow
## 
##   2806 trees rejected before finding a tree

obj<-ltt(tree,plot=FALSE,gamma=FALSE)
obj

## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 143 tips and 142 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.

plot(obj,log.lineages=FALSE,log="y",
    ylab="number of lineages (log-scale)")

plot of chunk unnamed-chunk-1

## multiple trees
trees<-pbtree(n=40,scale=1,nsim=200)
obj<-ltt(trees)

plot of chunk unnamed-chunk-1

obj

## 200 objects of class "ltt" in a list

## distribution of gamma
gamma<-sapply(obj,function(x) x$gamma)
hist(gamma,main="distribution of gamma from 200 simulations")

plot of chunk unnamed-chunk-1

plot(obj,log.lineages=FALSE,
    main="lineage through time plots for 200 trees",
    col=rgb(0,0,1,0.1)) ## plot lines with transparency

plot of chunk unnamed-chunk-1

The main technical challenge that I foresee is that I use the function do.call internally to execute plot.defaultand lines, depending on whether or not the lineage-through- time plot is to be added to the current graphical device. However, plot and lines take different arguments, so I can see that some users may be able to generate errors by supplying arguments that can be handled by plot but not by lines when the object being plotted is of class "multiLtt"! Please report this if you encounter it.

The code for this version of the function & associated methods is posted here, and the latest phytools package version build can be downloaded from the phytools page.

OK, that's it for now!

A phytools user recently contacted me with the following questions about the phytools function contMap for mapping the evolution of continuous trait on the tree using a color gradient:

a) Does the “length” value in the legend specify how many millions of years the legend bar represents?
b) If so, is it possible to suppress the length call and and have an additional time scale across the whole tree instead?
c) Is there any way to name the trait value?
d) Is there a simple way to change the colour, on which one maps the trait?

Since these questions probably come up for other users as well, I thought I'd try to address them one by one here.

First, let's review the function contMap. Here is a quick demo illustrating it's use, which I will explain further below:

library(phytools)
## simulate a tree
tree<-pbtree(n=26,tip.label=LETTERS,scale=100)
plotTree(tree)

plot of chunk unnamed-chunk-1

## simulate some character data
x<-fastBM(tree,sig2=0.1)
x

##          A          B          C          D          E          F 
## -5.2978022 -6.1858831 -5.7305370 -2.0332898 -4.2643267 -0.8574406 
##          G          H          I          J          K          L 
## -2.2867929 -1.1968481 -0.8374673 -0.6996979 -3.6258800  2.2016812 
##          M          N          O          P          Q          R 
##  2.3865797  3.0411627  2.9741112  0.5108229  0.8685003  1.6050838 
##          S          T          U          V          W          X 
##  3.1910221 -0.5765872  1.0726064 -1.2610571 -0.5643860  0.4667727 
##          Y          Z 
##  2.5718508  2.5810971

## generate an object of class "contMap"
obj<-contMap(tree,x,plot=FALSE)
obj

## Object of class "contMap" containing:
## 
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
## 
## (2) A mapped continuous trait on the range (-6.185883, 3.191022).

## plot it usin default settings:
plot(obj)

plot of chunk unnamed-chunk-2

The first question was: “Does the "length” value in the legend specify how many millions of years the legend bar represents?“

The answer is "yes” - if our branch lengths are in millions of years. Otherwise it will be in whatever units we have for the edge lengths of the tree. We can verify this by replotting our object of class "contMap", while leaving enough space for a horizontal axis which we can then add to the plot for comparison. We can even change the length of the plotted legend as follows:

plot(obj,mar=c(5.1,0.2,0.2,0.2),legend=40)
axis(1)
title(xlab="time from the root")

plot of chunk unnamed-chunk-3

The second question was then: “If so, is it possible to suppress the length call and and have an additional time scale across the whole tree instead?”

Well, I've just (above) demonstrated how to include a labeled x-axis in the plot. What I think the user really wants to do is suppress the text below the contMap legend. This is possible, but a little trickier. Here's how we can do it:

## first plot our tree sans legend
plot(obj,legend=FALSE,ylim=c(1-0.09*(Ntip(obj$tree)-1),Ntip(obj$tree)),
    mar=c(5.1,0.4,0.4,0.4))
## now add our legend, but without the text underneath
add.color.bar(40,obj$cols,title="trait value",
    lims=obj$lims,digits=3,prompt=FALSE,x=0,
    y=1-0.08*(Ntip(obj$tree)-1),lwd=4,fsize=1,subtitle="")
## now add our x-axis
axis(1)
title(xlab="time from the root")

plot of chunk unnamed-chunk-4

The next question was: “Is there any way to name the trait value?”

Well - from the example above, it should be obvious that this is possible. Let's combine them together - but, let's call our trait (for example) log(body size):

plot(obj,legend=FALSE,ylim=c(1-0.09*(Ntip(obj$tree)-1),Ntip(obj$tree)),
    mar=c(5.1,0.4,0.4,0.4))
add.color.bar(40,obj$cols,title="log(body size)",
    lims=obj$lims,digits=3,prompt=FALSE,x=0,
    y=1-0.08*(Ntip(obj$tree)-1),lwd=4,fsize=1,subtitle="")
## now add our x-axis
axis(1)
title(xlab="time from the root")

plot of chunk unnamed-chunk-5

Finally, the user asked: “Is there a simple way to change the colour, on which one maps the trait?” This one is actually pretty straightforward as there is a custom function in phytools, setMap for this. Let's change our rainbow color map to a blue-purple-red map:

obj<-setMap(obj,colors=c("blue","purple","red"))
## plot under default conditions
plot(obj)

plot of chunk unnamed-chunk-6

OK, that's it! I hope these answers are helpful to all phytools readers & users.

I just posted a version of the function ltt (Lineage-Through-Time) that permits the user to automatically overlay the phylogeny on a lineage through time plot. I wrote about how to do this manually in a previous post. This update is part of a new version of phytools, phytools 0.4-60, which has been submitted to CRAN - but for now can also be downloaded and installed from source from the phytools page.

Here is a demo of how it works:

library(phytools)
packageVersion("phytools")

## [1] '0.4.60'

tree<-pbtree(n=26,tip.label=LETTERS)
## simulated tree:
plotTree(tree)

plot of chunk unnamed-chunk-1

obj<-ltt(tree,plot=FALSE)
## object of class "ltt"
obj

## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
## 
## (3) A value for Pybus & Harvey's "gamma" statistic of -1.0815, p-value = 0.2795.

## regular ltt plot
plot(obj)

plot of chunk unnamed-chunk-2

## overlay tree on plot
plot(obj,show.tree=TRUE)

plot of chunk unnamed-chunk-3

That's all there is to it. At present I don't have it set up to change the color of the tree - however, there is an option to increase or decrease the transparency:

plot(obj,show.tree=TRUE,transparency=0.2)

plot of chunk unnamed-chunk-4

That's it for now. Stay tuned for another post detailing the updates in the latest phytools version.

Yesterday I submitted a new version of phytools to CRAN (phytools_0.4-60), and this afternoon it was accepted & posted.

There are no major new methods to report in this version over the last CRAN release of phytools; however I have made a number of different updates to a variety of frequently (& infrequently) used functions. Among these:

An update to plot.contMap to permit user control of the legend text.
A bug fix in phenogram for when ftype="off" (i.e., the traitgram is being plotted without labels.
A bug fix for ltt for an input tree with node labels.
A long overdue updateto the ancestral state reconstruction function rerootingMethod to permit trees with multifurcations.
New S3 print & plot methods, and a new custom object class, for the phytools lineage-through-time function ltt (here).
A minor bug fix in the phytools function writeAncestors.

Finally, 7. an update to ltt to permit the lineage-through-time plot to include an overlain phylogeny (1, 2).

This new package update was also necessited by changing CRAN policies which now require that all internally used functions from core R packages, such as graphics, utils, and methods also be imported into the namespace and specified in the DESCRIPTION file - so you can also see that on the phytools CRAN page.

That's all for now! Hopefully there will be more interesting updates to report for the next new CRAN version of phytools.

New version of phytools (phytools 0.4-56) now on CRAN

How to get the colors of the tips in a plotted "contMap" object

PGLS with measurement or sampling error in the dependent variable, y

Comparing two objects of class "densityMap"

Small update to control legend text in contMap

About how ace(...,marginal=TRUE) does not compute marginal ancestral states [but ace(...,marginal=FALSE) does]

Bug fix for phenogram when ftype="off"

Computing map.overlap for a set of trees

Phylogenetic canonical correlation analysis using contrasts

Overlaying a tree on an LTT plot

Bug fix for ltt with node labels

Simple method to rescale tree to a particular mean tip height

Testing a null hypothesis of K=1.0 with measurement error in the estimation of trait means for species

T-shirt design for Ilhabela workshop - in R

Calculating the times spent with each of number of lineages on the tree

Update to rerootingMethod for ancestral state reconstruction to permit polytomies

New S3 print/plot methods & new functionality for phytools ltt

Customizing your contMap style visualization using phytools: User questions & answers

New feature in LTT plotter to overlay plotted tree

New version of phytools (phytools 0.4-60) on CRAN