Some useful updates to contMap & densityMap

May 21, 2014, 1:32 pm

≫ Next: New version of plotTree.wBars that permits positive & negative phenotypic trait values

≪ Previous: Changing the color ramp in contMap or densityMap

The most common error reported by users of contMap&densityMap is the following:

Error in while (x > trans[i]) { : missing value where TRUE/FALSE needed

This is usually (but not always) due to the tree having internal edges of zero length, and is just the result of how I set up the internal machinery of these functions. So, for instance:

> tree<-rtree(10)
> tree$edge.length[which(tree$edge[,2]==14)]<-0
> plotTree(tree)

> x<-fastBM(tree)
> obj<-contMap(tree,x)
Error in while (x > trans[i]) { : missing value where TRUE/FALSE needed

Obviously, this could be circumvented by first collapsing zero-length branches into polytomies, and this is typically what I have advised users to do - but there is no really good reason why contMap shouldn't be able to work with the zero-length branches still intact. I have just posted a new version with the bug fixed, and it is in a new release of phytools, which can be downloaded & installed from source here.

Here's a demo using the same tree as before:

> packageVersion("phytools")
[1] ‘0.4.10’
> obj<-contMap(tree,x)

I have also added a new function to automate the process of changing the color ramp, described here. This function is called setMap. Here's a demo:

> obj<-setMap(obj,colors=c("blue","green","yellow"), space="Lab")
> plot(obj)

Obviously, there is no way that R can make you use a reasonable color scheme here!

In addition to fixing the same branches of zero-length problem in densityMap, it also now can plot the posterior density of any two-state character, not just a binary character with states of "0"&"1". This required a few small modifications, including accommodating the possibility that someone might use one & only one of states "0"&"1" (and then some other state not "0" or "1"). Here's a demo:

> tree<-pbtree(n=26,tip.label=letters[26:1],scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-LETTERS[1:2]
> x<-sim.history(tree,Q)$states
> trees<-make.simmap(tree,x,nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
A B
A -0.5933734 0.5933734
B 0.5933734 -0.5933734
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
A B
0.5 0.5
Done.
> obj<-densityMap(trees,plot=FALSE,res=500)
sorry - this might take a while; please be patient
> plot(obj,outline=TRUE,lwd=6)

That's it for now.

↧

New version of plotTree.wBars that permits positive & negative phenotypic trait values

May 28, 2014, 5:33 pm

≫ Next: phytools blog no longer permits anonymous comments

≪ Previous: Some useful updates to contMap & densityMap

A couple of weeks ago I posted a new function (1, 2) to plot bars at the tips of a circular or square phylogram. One limitation of this function is that because the bars are plotted 'growing' out of each leaf of the tree, the values of the phenotypic trait data underlying the bars cannot be negative. Negative values would result in bars growing (in the case of a fan tree) towards the root of the tree - which, of course, does not look right at all.

Here's what I mean:

> tree<-pbtree(n=100)
> x<-fastBM(tree)
> plotTree.wBars(tree,x,scale=0.5)

> ## or setting type="fan"
> plotTree.wBars(tree,x,scale=0.5,type="fan")

Now let's try the new version:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.4.11’
> plotTree.wBars(tree,x,scale=0.5)

> plotTree.wBars(tree,x,scale=0.5,type="fan")

I'm not sure it makes a great visual in this case - but, nonetheless, it's better.

When some values of x are negative and the tree is non-ultrametric, then the bars are also centered a constant distance from the maximum tip height in the tree. For instance:

> tree<-rtree(n=50)
> x<-fastBM(tree)
> plotTree.wBars(tree,x,scale=0.3)

The code for this new function version is here; but you can also install a new build of phytools from source with this update.

↧

phytools blog no longer permits anonymous comments

June 2, 2014, 8:54 am

≫ Next: New version of fancyTree for type="scattergram"

≪ Previous: New version of plotTree.wBars that permits positive & negative phenotypic trait values

Basically, the post title says it all. The phytools blog was getting so many spam comments (and more & more of it was getting through the automatic blogger spam filters) that it was starting to displace genuine questions & comments. Please email me if you try to comment on this blog & fail, or if you find that the new limitations on commenting are too restrictive. Thanks!

↧

New version of fancyTree for type="scattergram"

June 4, 2014, 10:03 pm

≫ Next: Locating the phylogenetic postion of the Yeti: Placing cryptic, recently extinct, or hypothesized taxa in an ultrametric phylogeny using continuous character data

≪ Previous: phytools blog no longer permits anonymous comments

I just fixed a small bug in fancyTree(...,type="scattergram") that can cause axis labels to suddenly appear inside the multi-panel plot when edited using an external application. Just as a reminder, fancyTree(...,type="scattergram") creates a type of 'phylogenetic scatterplot matrix' for two or more continuous characters & a tree:

> require(phytools)
Loading required package: phytools
> tree<-pbtree(n=26,tip.label=LETTERS[26:1])
> X<-fastBM(tree,nsim=3)
> fancyTree(tree,type="scattergram",X=X)

Code is here, and a 'bleeding-edge' phytools version with this update can be found here.

↧

Locating the phylogenetic postion of the Yeti: Placing cryptic, recently extinct, or hypothesized taxa in an ultrametric phylogeny using continuous character data

June 4, 2014, 10:07 pm

≫ Next: Plotting a slanted cladogram using phytools

≪ Previous: New version of fancyTree for type="scattergram"

I recently returned from teaching in the AnthroTree 2014 Workshop. This is a course organized by Charlie Nunn that is designed to (in brief) introduce anthropologists to basic & advanced methods in phylogenetic comparative biology.

Well, I won't get into too many details, but a grad student in the workshop was interested in placing a hypothesized taxon into an ultrametric phylogeny when she had only continuous character data for this focal species, along with the same characters measured for all the other species in the phylogeny. This turns out to be not a very hard problem. It has the following parts:

(1) Computing the likelihood of continuous character data on a tree. This is not hard. We can just use the method of Felsenstein (1973), which is actually exactly equivalent to computing the likelihood of a Brownian motion model on our tree using standard comparative method machinery.

(2) Accounting for correlations among characters. This is important when the tree is unknown - but in this case we have a base tree containing all but one of our taxa. This makes accounting for correlations among characters in our likelihood calculation quite straightforward. We can just rotate our data using phylogenetic PCs obtained via PCA on the N - 1 tips in our base tree. We then compute the scores for all the taxa in the tree, and the taxon of unknown phylogenetic affinity. Finally, our log-likelihood just becomes the summed log-likelihoods of each of these now evolutionarily orthogonal characters.

(3) Finding the ML tree. Well, unlike the problem of ML optimization from continuous characters when the tree is unknown, this just involves - at worst - optimizing the position of the new branch along each of the 2(N - 2) edges in our base tree, and then picking the edge and position that maximized the likelihood. In reality, we can probably just compute the likelihood from the midpoint of each edge, and then narrow our search to a much smaller set of edges which we search more thoroughly. We only have to find the divergence point because our tree is ultrametric and our new taxon is (we have assumed) extant.

Graham Slater suggested this method be called (facetiously, of course) locate.yeti, and I have just now posted code for this on the phytools page. Here's a quick demo of just how well it works:

## load required packages
require(phytools)
require(phangorn)
require(mnormt)

## load source
source("locate.yeti.R")

## simulate tree & data
N<-50 ## taxa in base tree
m<-10 ## number of continuous characters
## simulate tree
tt<-tree<-pbtree(n=N+1,tip.label=sample(c(paste("t",1:N,
sep=""),"Yeti")))
## generate a covariance matrix for simulation
L<-matrix(rnorm(n=m*m),m,m)
L[upper.tri(L,diag=FALSE)]<-0
L<-L-diag(diag(L))+abs(diag(diag(L)))
V<-L%*%t(L)
X<-sim.corrs(tree,vcv=V)
## visualize trees
par(mfrow=c(1,2))
plotTree(tt,mar=c(0.1,0.1,4.1,0.1))
title("tree with Yeti")
tree<-drop.tip(tree,"Yeti")
plotTree(tree,mar=c(0.1,0.1,4.1,0.1),direction="leftwards")
title("tree without Yeti")

## run locate.yeti
mltree<-locate.yeti(tree,X,plot=T)

(This plot shows the likelihood of attaching the tip to any edge.)

## plot results
par(mfrow=c(1,2))
plotTree(tt,mar=c(0.1,0.1,4.1,0.1))
title("true position of Yeti")
plotTree(mltree,mar=c(0.1,0.1,4.1,0.1),
direction="leftwards")
title("estimated position of Yeti")

## compute RF distance
RF.dist(tt,mltree)
## [1] 0

(Of course, it doesn't always do this well!)

In my optimization routine, I take the single edge with the highest likelihood from step one, and then get its parent (if one exists) and daughter (likewise) edges. Then I perform numerical optimization of the position of the cryptic lineage on each of these one, two, three, or more edges. Finally, I pick the edge & location with the highest likelihood. It is, of course, possible that an edge with the cryptic taxon at its midpoint might have an even higher likelihood if that taxon was placed somewhere else along the edge, so this would be the next natural step in expanding this heuristic to have better confidence that we have found the ML position.

Finally, there is no theoretical difficulty in using the same general approach to place a fossil taxon on the tree. In that case we just have one additional parameter to optimize - the terminal edge length. Similarly, we could also using a Bayesian MCMC approach to (say) put prior probabilities on the edges that are more or less likely to have produced our cryptic or fossil lineage.

That's it for now.

↧

Plotting a slanted cladogram using phytools

June 5, 2014, 9:55 pm

≫ Next: Generic rep (replicate) function for objects of class "phylo"&"multiPhylo"

≪ Previous: Locating the phylogenetic postion of the Yeti: Placing cryptic, recently extinct, or hypothesized taxa in an ultrametric phylogeny using continuous character data

I inadvertently discovered the algorithm for plotting a right-facing slanted cladogram this evening. Basically, to get the vertical position of each internal node we just assign heights 1 through N to the tips; and then each internal node is merely the simple arithmetic mean of its descendants. We get the horizontal positions (the heights above the root) using the method of Grafen with ρ=1, implemented in the ape function compute.brlen.

We can apply this algorithm via the phytools function phenogram as follows:

plotCladogram<-function(tree){
foo<-function(tree,x){
   n<-1:tree$Nnode+length(tree$tip.label)
   setNames(sapply(n,function(n,x,t) mean(x[Descendants(t,n,
  "tips")[[1]]]),x=x,t=tree),n)
}
tree<-reorder(tree,"cladewise")
x<-setNames(1:length(tree$tip.label),tree$tip.label)
phenogram(compute.brlen(tree),c(x,foo(tree,x)),ylab="")
}

For example:

> require(phytools)
> require(phangorn)
> tree<-pbtree(n=26,tip.label=LETTERS[26:1])
> plotCladogram(tree)

↧

Generic rep (replicate) function for objects of class "phylo"&"multiPhylo"

June 6, 2014, 2:51 pm

≫ Next: Performing stepwise phylogenetic regression in R

≪ Previous: Plotting a slanted cladogram using phytools

Again, working on something else, I realized that there is no generic rep (replicate elements of vectors or lists) function for objects of class "phylo" or "multiPhylo". These were straightforward to do (although undoubtedly could've been programmed more elegantly & they use c.phylo&c.multiPhylo. The only hitch was in appending an additional tree to a list (object of class "multiPhylo"), we can't use c.multiPhylo without doing:

obj<-c(obj,list(x))

Here is the code:

rep.phylo<-function(x,...){
if(hasArg(times)) times<-list(...)$times
else times<-(...)[[1]]
for(i in 1:times)
obj<-if(i==1) x else if(i==2) c(obj,x) else
c(obj,list(x))
class(obj)<-"multiPhylo"
obj
}

rep.multiPhylo<-function(x,...){
if(hasArg(times)) times<-list(...)$times
else times<-(...)[[1]]
for(i in 1:times) obj<-if(i==1) x else if(i>=2) c(obj,x)
class(obj)<-"multiPhylo"
obj
}

I will add to a future version of phytools.

↧

Performing stepwise phylogenetic regression in R

June 7, 2014, 1:02 pm

≫ Next: Coloring edges in phylomorphospace3d

≪ Previous: Generic rep (replicate) function for objects of class "phylo"&"multiPhylo"

A colleague asked the following:

"Is there a way to do a multiple stepwise regression on continuous data that incorporates phylogenetic information and selects the best model based on an AICc score?"

Although I myself had not done this before, I figured that stepwise regression with objects of class "gls" would likely be something that someone, somewhere, in the wide world of R would have thought of before - and this is indeed correct.

Here's a complete demo with simulation code.

First, simulate data. In this case, x₁ and x₃ have an effect on y, so they ought to be found in the final model - but x₂ and x₄ intentionally do not.

## load packages
require(phytools)
require(nlme)
require(MASS)

## simulate data (using phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
x1<-fastBM(tree)
x2<-fastBM(tree)
x3<-fastBM(tree)
x4<-fastBM(tree)
y<-0.75*x1-0.5*x3+fastBM(tree,sig2=0.2)
X<-data.frame(y,x1,x2,x3,x4)

Now let's fit our model & run the stepwise regression method using AIC:

## build model (using nlme)
obj<-gls(y~x1+x2+x3+x4,data=X,correlation=
corBrownian(1,tree),method="ML")

## run stepwise AIC (using MASS & nlme)
fit<-stepAIC(obj,direction="both",trace=0)

trace just controls the amount of information about the backward & forward stepwise procedure that is being run. If we want more, we should increase it.

Finally, here's a worked simulation with results:

> ## simulate data (using phytools)
> tree<-pbtree(n=26,tip.label=LETTERS)
> x1<-fastBM(tree)
> x2<-fastBM(tree)
> x3<-fastBM(tree)
> x4<-fastBM(tree)
> y<-0.75*x1-0.5*x3+fastBM(tree,sig2=0.2)
> X<-data.frame(y,x1,x2,x3,x4)
> ## build model (using nlme)
> obj<-gls(y~x1+x2+x3+x4,data=X,correlation=
corBrownian(1,tree),method="ML")
> ## run stepwise AIC (using MASS & nlme)
> fit<-stepAIC(obj,direction="both",trace=0)
> fit
Generalized least squares fit by maximum likelihood
Model: y ~ x1 + x3
Data: X
Log-likelihood: -10.43514

Coefficients:
(Intercept) x1 x3
-0.4952197 0.6882657 -0.5108970

Correlation Structure: corBrownian
Formula: ~1
Parameter estimate(s):
numeric(0)
Degrees of freedom: 26 total; 23 residual
Residual standard error: 0.594681

This is just about what we'd expect. Cool.

↧

Coloring edges in phylomorphospace3d

June 10, 2014, 6:11 am

≫ Next: Constraining internal node values using make.simmap

≪ Previous: Performing stepwise phylogenetic regression in R

A recent R-sig-phylo query asked the following about phytools function phylomorphospace3d (for projecting a tree into a three-dimensional morphospace):

"[Is t]here is a possbility to color some specific edges in the phylomorphospace3d()? plot.phylo() has the argument "edge.color" to do that in classical tree plotting. I need it, however, in the above mentioned function."

This functionality (as well as the ability to plot a stochastic character map on a phylomorphospace) exists for two-dimensional phylomorphospace plotting, but it did not (until now) exist for phylomorphospace3d; however, this was not too difficult to add. I have added user control of plotted line colors via the control parameter col.edge (as in phylomorphospace), rather than via a new function argument.

Code for this new function version is here; but users may prefer to install the latest bleeding edge source version of phytools (phytools 0.4-14) to ensure that everything works as advertised.

Here is a demo that plots the internal edges of a 3D phylomorphospace in blue, while the terminal edges are plotted in red:

## load phytools
require(phytools)
packageVersion("phytools") ## should be >=0.4-14
## simulate tree
tree<-pbtree(n=12,tip.label=LETTERS[1:12])
X<-fastBM(tree,nsim=3)
## create color vector
colors<-sapply(tree$edge[,2], function(x,n) if(x>n) "blue"
else "red",n=length(tree$tip.label))
## generate 3D phylomorphospace plot
phylomorphospace3d(tree,X,control=list(lwd=2,
col.edge=colors))

BTW, to create this cool .gif, you just need to do something like:

movie3d(phylomorphospace3d(tree,X,control=list(lwd=2,
col.edge=colors)),movie="phylomorph-3d",duration=10)

The same option is also available for method="static", e.g.:

phylomorphospace3d(tree,X,control=list(lwd=2,
col.edge=colors),method="static",angle=20)

That's it.

↧

Constraining internal node values using make.simmap

June 14, 2014, 6:24 pm

≫ Next: Computing the average trait value for the set of taxa descended from each node

≪ Previous: Coloring edges in phylomorphospace3d

A phytools user asked the following question:

"I know I can constrain the state of the root node, but is it possible to also constrain the state at specific internal nodes?"

This is possible, although it requires a hack. We can basically attach a terminal edge (a tip) of zero length to any of the nodes that we want to constrain. We can then assign a values to that tip that is the value to which we want to constrain that node. After we conduct our stochastic mapping, we can then prune all the zero length terminal edges we've added from our stochastically mapped trees.

Here is a quick demo:

> ## first let's simulate a true character history
> tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
> Q<-matrix(c(-2,2,2,-2),2,2)
> rownames(Q)<-colnames(Q)<-letters[1:2]
> true.history<-sim.history(tree,Q)
> cols<-setNames(c("blue","red"),letters[1:2])
> plotSimmap(true.history,cols)
> nodelabels()

> ## now let's run our analysis as normal
> ## (i.e., without constraint)
> x<-getStates(true.history,"tips")
> trees<-make.simmap(tree,x,nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
a b
a -2.29596 2.29596
b 2.29596 -2.29596
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
a b
0.5 0.5
Done.
> ## now let's summarize the posterior probabilities
> ## at each node
> plotSimmap(trees[[1]],cols) ## one of our maps
> obj<-describe.simmap(trees)
> nodelabels(pie=obj$ace,piecol=cols,cex=0.6)

> ## next let's constrain, say, node 46 to be
> ## its true value
> x<-c(getStates(true.history,"tips"),
getStates(true.history,"nodes")["46"])
> tt<-bind.tip(tree,"46",edge.length=0,where=46)
> plotTree(tt)

> trees<-make.simmap(tt,x,nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
a b
a -2.219295 2.219295
b 2.219295 -2.219295
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
a b
0.5 0.5
Done.
> ## now drop all tips "46" from the trees
> trees<-lapply(trees,drop.tip.simmap,tip="46")
> class(trees)<-"multiPhylo"
> ## now let's once again summarize our results:
> plotSimmap(trees[[1]],cols) ## one of our maps
> obj<-describe.simmap(trees)
> nodelabels(pie=obj$ace,piecol=cols,cex=0.6)

If we want to fix multiple nodes, we have to keep in mind that the node numbering will change every time that we add a tip to the tree.

In the most extreme case we might know the states at each internal node of the tree. In this case, we are just sampling reconstructions of the character history conditioned on the true (known) states at all internal nodes. This is what that might look like:

> ## pull our data vector
> x<-c(getStates(true.history,"tips"),
getStates(true.history,"nodes"))
> nodes<-1:tree$Nnode+length(tree$tip.label)
> for(i in 1:length(nodes)){
if(i==1) tt<-bind.tip(tree,nodes[i],where=nodes[i],edge.length=0)
else {
M<-matchNodes(tree,tt)
tt<-bind.tip(tt,nodes[i],where=M[which(M[,1]==nodes[i]),2],
edge.length=0)
}
}
> plotTree(tt)

> ## now run stochastic mapping
> trees<-make.simmap(tt,x,nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
a b
a -2.91958 2.91958
b 2.91958 -2.91958
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
a b
0.5 0.5
Done.
> ## drop all added tips
> trees<-lapply(trees,drop.tip.simmap,
tip=as.character(nodes))
> class(trees)<-"multiPhylo"
> ## visualize the results
> plotSimmap(trees[[1]],cols) ## one of our maps
> obj<-describe.simmap(trees)
> nodelabels(pie=obj$ace,piecol=cols,cex=0.6)

We can even do the following:

> obj<-densityMap(trees,plot=FALSE)
sorry - this might take a while; please be patient
> plot(obj,lwd=4,outline=TRUE)

Which, in this case, shows the uncertainty about where transitions between states occurred, even when conditioned on knowing the true states at all nodes.

Finally, since make.simmap also allows us to supply a prior on tip states, we can use the same hack to put a prior on internal nodes.

OK. That's it for now.

↧

Computing the average trait value for the set of taxa descended from each node

June 16, 2014, 6:51 pm

≫ Next: Alternative node placements in plotted trees using plotTree & plotSimmap

≪ Previous: Constraining internal node values using make.simmap

A recent comment asked if there was an easy way to substitute the raw average value of the tips descended from a node for ancestral states obtained via ancestral state reconstruction when using the phytools function plotBranchbyTrait. This is pretty easy. Here's a demo (although it could be done a dozen different ways, including using a simple for loop).

(This assumes only that our tree is an object of class "phylo" called tree; and, importantly, that our trait data is a named vector x, in which the names correspond to the tip labels of tree.)

getDescendants<-phytools:::getDescendants
nn<-1:tree$Nnode+Ntip(tree)
a<-setNames(sapply(nn,function(n,x,t){
d<-getDescendants(t,n)
mean(x[t$tip.label[d[d<=Ntip(t)]]])
},x=x,t=tree),nn)

Now we can use it with plotBranchbyTrait as follows. First, if we want each edge to be colored as the average of the parent & daughter nodes (or tip):

plotBranchbyTrait(tree,c(x,a),mode="nodes")

Alternatively, we can have the branch color determined only from the daughter nodes or tips (in which case we throw out the root value):

y<-c(setNames(x[tree$tip.label],1:Ntip(tree)),a)
plotBranchbyTrait(tree,y[tree$edge[,2]],mode="edges")

That's pretty much it.

↧

Alternative node placements in plotted trees using plotTree & plotSimmap

June 18, 2014, 11:56 am

≫ Next: Updates to phylomorphospace3d & locate.yeti; new version of phytools

≪ Previous: Computing the average trait value for the set of taxa descended from each node

Felsenstein (2004; pp. 574-576) gives four different node placements for square phylograms. The most commonly used node placement what Felsenstein calls intermediate, i.e., the node is placed at the midpoint of the uppermost & lowermost edges immediately descended from that node. This is what that looks like for a stochastic 13 taxon tree:

> packageVersion("phytools")
[1] ‘0.4.16’
> plotTree(tree)

Centered node placement puts the node - instead of at the vertical point intermediate between the upper & lowermost immediate daughters - at the midpoint of all tips descended from a node. This is just the average of the upper and lowermost of this set. Here's what that looks like for the same tree:

> plotTree(tree,nodes="centered")

Like intermediate node placement, weighted node placement uses only the immediate descendants - but it weights their vertical position (inversely) by the edge length leading to each daughter node. E.g.:

> plotTree(tree,nodes="weighted")

(Note that Figure 34.1c in Felsenstein appears to use a slightly different algorithm than that described in the text - which I have followed here - in which internal and tip edges are treated differently.)

Finally, inner node placement, puts each node in the 'innermost' position of all its descendants. E.g.:

> plotTree(tree,nodes="inner")

(In this case the algorithm described by Felsenstein (2004; p. 575) seems to be incorrect unless I've misread. I achieved the desired effect not by using the of the y values of the descendant edges; but the vertical position closest to the median vertical position of all the tips.)

All of these methods can also be used with stochastic mapped trees (since plotSimmap is the engine that runs plotTree). For instance:

> data(anoletree)
> plotSimmap(anoletree,nodes="inner",fsize=0.7,ftype="i")
no colors provided. using the following legend:
CG GB TC TG Tr Tw
"black""red""green3""blue""cyan""magenta"
> add.simmap.legend(colors=setNames(palette()[1:6],
sort(unique(getStates(anoletree,"tips")))))
Click where you want to draw the legend

That's it. See some of you at 'Evolution' in a few days.

↧

Updates to phylomorphospace3d & locate.yeti; new version of phytools

June 27, 2014, 1:31 pm

≫ Next: Bug fix in make.simmap for multi-state data and non-reversible model of evolution

≪ Previous: Alternative node placements in plotted trees using plotTree & plotSimmap

I just posted a new version of phytools. This can be downloaded from the phytools page & installed from source; however I have also submitted this version (phytools 0.4-20) to CRAN, so hopefully it will be accepted by the CRAN gatekeepers and spread out onto all the CRAN mirrors within a few days.

Some important updates in this version relative to the latest minor version include:

1. An update to locate.yeti to permit the use of a set of edge constraints to search for the ML position of the leaf to be added to the tree.

2. An update to phylomorphospace3d so that it now checks internally to see if the package rgl has been installed, and then loads it if available. This permits me to change phytools relationship with rgl from Depends to Suggests, which is designed to allow users to install & load phytools on systems where rgl cannot be installed.

This version of phytools also has a wide range of other new functions & updates relative to the last CRAN version, for instance:

1. A new version of contMap that permits some trait values for tip taxa to be missing from the dataset.

2. A new function, plotTree.wBars, that permits users to plot square or circular phylograms with bars instead of tip labels (e.g., 1, 2, 3).

3. A new function, setMap, to set the color-ramp for an object of class "contMap" or "densityMap"..

4. Some other useful updates to contMap and densityMap.

5. A new version of fancyTree that fixes a small bug for type = "scattergram".

6. The aforementioned new function (locate.yeti) for locating the position of a cryptic or recently extinct taxon on an ultrametric phylogeny using continuous character data.

7. An S3 generic method rep for objects of class "phylo" and "multiPhylo" (here).

8. A new option in phylomorphospace3d to color internal edges.

Finally, 9. New options for node placement in square phylograms plotted with plotSimmap and plotTree.

That's it!

↧

Bug fix in make.simmap for multi-state data and non-reversible model of evolution

July 1, 2014, 1:18 pm

≫ Next: New version of read.newick that can read root node labels

≪ Previous: Updates to phylomorphospace3d & locate.yeti; new version of phytools

In response to an R-sig-phylo query about stochastic mapping in phytools I wrote:

I can answer the three questions that pertain to phytools:

(1) The function make.simmap has a bug for non-reversible models when the input character has more than 2 states. This has to do with the algorithm for simulating changes along edges where the correct waiting time is simulated, but then the states at the end of the waiting time are chosen with the incorrect probabilities. This will sometimes cause make.simmap to hang for a long time. This does not affect the states simulated at nodes (which occurs first) and should not affect stochastic character mapping at all for binary characters or any reversible model of character evolution. I have posted a fixed version here - but you can also install a version of phytools containing this update here. Use phytools >= 0.4-22.

(2) You should look into the function describe.simmap to summarize the time spent in each state, etc., from stochastic character mapping using make.simmap..

(3) To simulate stochastic character histories you can use the phytools function sim.history. Note, though, that the matrix Q is the transpose of the fitted value of Q from make.simmap. Sorry about this. That means to simulate stochastic character histories on your tree you can do:

## stochastic mapping:
mtrees<-make.simmap(tree,x,model="ARD",nsim=100)
## summarize results
obj<-describe.simmap(mtrees)
obj
plot(obj) ## PP at nodes from stochastic mapping
fitted.Q<-mtrees[[1]]$Q
## simulate under fitted model
simhistories<-sim.history(tree,t(fitted.Q),nsim=100,anc=rstate(obj$ace[1,]))

The last part (anc=rstate(obj$ace[1,])) is necessary if you have non-reversibility, because this way you are picking from the posterior distribution at the root from your real data.

In addition, sim.history does not permit any columns (rows in the matrix from make.simmap) to be equal to 0.0 (which we will have if we have a truly non-reversible character). To resolve this, you can do something like:

ii<-which(diag(fitted.Q)==0)
fitted.Q[ii,-ii]<-max(nodeHeights(tree))*1e-12
fitted.Q[ii,ii]<--sum(fitted.Q[ii,])
simhistories<-sim.history(tree,t(fitted.Q),nsim=100,anc=rstate(obj$ace[1,]))

↧

New version of read.newick that can read root node labels

July 1, 2014, 6:18 pm

≫ Next: Principal components rotation of ancestral states; or ancestral state reconstruction of PC scores?

≪ Previous: Bug fix in make.simmap for multi-state data and non-reversible model of evolution

The phytools function read.newick is a slower, but slightly more robust Newick tree reader than read.tree in the ape package - specifically in that it permits certain types of 'badly conformed' Newick strings, such as those containing singleton nodes.

It's Achilles heel, however, has been that it does not permit a node label for the root node of the tree. The fix for this was simple and was supplied to me by Joseph Brown. The updated function code is here& also in the latest non-CRAN phytools version (phytools >= 0.4-23) here.

↧

Principal components rotation of ancestral states; or ancestral state reconstruction of PC scores?

July 2, 2014, 11:35 am

≫ Next: New user controls in phylo.to.map

≪ Previous: New version of read.newick that can read root node labels

I recently fielded the following query:

From what I understand, the phylomorphospace function allows the mapping of the phylogeny on a morphospace defined by 2 characters/traits. However, I'm using multivariate data from a morphometric analysis, and the shape variation is explained by a lot of axes. The first 2 PCs can approximate the shape variation, but my question is: how do you calculate ancestral nodes using all the PC scores, or all of your fourier coefficients if using this method? Is it possible? Do you calculate the nodes on the PCs you are interested in? Or do you calculate the nodes on all your variables and then project them in the desired shape space?

So, in essence, should we perform ancestral state reconstruction on our original characters and then rotate them using the PC loadings; or should we first run phylogenetic PCA and then perform ancestral state reconstruction on the PC scores for species?

Well, the answer is - it doesn't matter. We will obtain the same ancestral state reconstructions for our PCs regardless of our order of operations. This could no doubt be proved analytically, but it is also straightforward to demonstrate via simulation as follows.

First, simulate tree & data:

> ## stochastic pure-birth tree
> tree<-pbtree(n=26,tip.label=LETTERS)
> ## random covariance matrix
> L<-matrix(rnorm(16),4)
> L[upper.tri(L)]<-0
> V<-L%*%t(L)
> ## simulated correlated character evolution on the tree
> X<-sim.corrs(tree,vcv=V)

Now, we can do our two analyses - first compute ancestral states on the original data & rotate using the phylogenetic PCA loadings; and then rotate our tip data, and reconstruct ancestral states for the PC scores:

> ## perform phylogenetic PCA
> pca<-phyl.pca(tree,X)
> ## reconstruct ancestral states for original data
> A<-apply(X,2,fastAnc,tree=tree)
> ## rotate in PC space
> ## compute phylogenetic means for calculation
> M<-matrix(rep(A[1,],nrow(A)),nrow(A),ncol(A),byrow=TRUE)
> rotatedA<-(A-M)%*%pca$Evec
> ## now compute ancestral states for PCA scores
> S<-apply(pca$S,2,fastAnc,tree=tree)

Now we can compare them:

> plot(rotatedA,S)

That's it.

↧

New user controls in phylo.to.map

July 14, 2014, 7:54 pm

≫ Next: Update to locate.yeti permitting the missing tip to attach below the root node

≪ Previous: Principal components rotation of ancestral states; or ancestral state reconstruction of PC scores?

I just posted a version of the phytools function phylo.to.map, which can be used to project a tree onto a geographic map. Based on some user feedback, including at the Latin American Macroevolutionary Workshop we just held at the Universidad de los Andes in Bogotá, Colombia, I have added a couple of new features & attributes to the function.

First, the function now automatically removes the underscore character ("_") from within species names. Note that this is only for type="phylogram", not type="direct"; although in the latter case it can easily be done manually.

Second, again for type="phylogram", the lines connecting the tree to locations on a geographic map now (by default) arise from the end of the tip label - rather than from the tip itself.

Finally, third, the function now allows user control of the color of both the dashed lines & points connecting species labels to their geographic locations (for type="phylogram"); and of the both the lines and tip dot colors (for type="direct".

These updates are in the latest non-CRAN phytools build. Here's a quick demo:

> packageVersion("phytools") [1] ‘0.4.24’
> ## simulate some realistic looking data
> foo<-function() paste(c(sample(LETTERS,1),
sample(letters,round(runif(n=1,min=3,max=5))),"_",
sample(letters,round(runif(n=1,min=4,max=6)))),
collapse="")
> tree<-pbtree(n=40,tip.label=replicate(40,foo()))
> tree$edge.length<-tree$edge.length*
100/max(nodeHeights(tree))
> lat<-fastBM(tree,sig2=10,bounds=c(-90,90))
> long<-fastBM(tree,sig2=20,bounds=c(-180,180))
> X<-cbind(lat,long)
> ## ok, first the (new) standard plot:
> obj<-phylo.to.map(tree,X,ftype="i",fsize=0.7,
asp=1.2,split=c(0.5,0.5))
objective: 302
...
objective: 220
objective: 218
objective: 218

> ## now let's change the colors:
> plot(obj,colors=c("blue","red"),ftype="i",fsize=0.7,
asp=1.2,split=c(0.5,0.5))

> ## direct projection with different colors
> plot(obj,type="direct",colors=c("red","black"),
fsize=0.7)

That's it.

↧

Update to locate.yeti permitting the missing tip to attach below the root node

July 25, 2014, 8:39 am

≫ Next: Bug fix for S3 plotting method for objects of class "describe.simmap"

≪ Previous: New user controls in phylo.to.map

I just updated the phytools function locate.yeti (a method to attach a missing tip to an ultrametric base tree using continuous character data) so that the missing leaf is now permitted to attach below the root node.

When I first started working on this, I initially mistakenly thought that I would have to add one additional parameter to when attaching a new leaf to the root node - that being the length of the edge leading to the tip. Actually, this is not the case. Whereas for a leaf attached to an edge we have to optimize the position along the edge from whence the lineage splits; in the case of a leaf attached to the root we only have to optimize its total length. Since our tree is invariably ultrametric - the lengths of the root stem and the terminal edge can be found by simply midpoint rooting our phylogeny.

The new version of locate.yeti is here; however since locate.yeti uses phytools internal functions it is probably best & easiest to simply install the latest non-CRAN phytools version.

Here's a quick demo using a phylogeny in which the true position of the missing lineage is sister to our ultrametric base tree:

> packageVersion("phytools")
[1] ‘0.4.25’
> tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
> tt<-bind.tip(tree,where=length(tree$tip.label)+1,
edge.length=1.2*max(nodeHeights(tree)),
tip="Missing-lineage")
> tt<-midpoint.root(tt)
> ## this is the full true tree
> plotTree(tt)

> ## simulate on the full tree
> X<-fastBM(tt,nsim=10)
> ## place our missing taxon
> mltree<-locate.yeti(tree,X,method="exhaustive")
Optimizing the phylogenetic position of Missing-lineage. Please wait....
Done.
> mltree$logL
[1] -221.0057 > plotTree(mltree)

Although it's pretty obvious that we have the correct tree in this case, we can also quantify it:

> require(phangorn)
Loading required package: phangorn
> RF.dist(tt,mltree)
[1] 0
> require(Rphylip)
Loading required package: Rphylip
> Rtreedist(tt,trees2=mltree,quiet=TRUE)
2,1
1,1 0.0862447

That's pretty much it for now.

↧

Bug fix for S3 plotting method for objects of class "describe.simmap"

July 26, 2014, 7:01 pm

≫ Next: Improvements/bug-fixes in plotBranchbyTrait

≪ Previous: Update to locate.yeti permitting the missing tip to attach below the root node

In recent phytools versions the phytools function describe.simmap returns an object of class "describe.simmap" which, if plotted, shows a phylogeny with posterior probabilities from stochastic mapping shown as pie charts at each node and tip of the tree. Unfortunately, when I switched the internal tree plotting function from ape's plot.phylo to plotTree in phytools, I made a mistake in setting the automatic offset of tip labels such that labels are frequently plotted outside the plotting window. This bug was reported to me today by a colleague and I just posted a new version of phytools with the bug in this plotting method fixed. This phytools version (>=0.4-26) can be downloaded from the phytools page and installed from source. Obviously, this fix will also be in subsequent CRAN releases of the phytools package.

↧

Improvements/bug-fixes in plotBranchbyTrait

July 28, 2014, 11:59 am

≫ Next: Live-streaming lectures today on phylogenetic comparative methods

≪ Previous: Bug fix for S3 plotting method for objects of class "describe.simmap"

Unlike most of the phylogeny plotting functions of phytools which use the multifunctional plotting function plotSimmap internally, the phytools function plotBranchbyTrait is nothing more than a fancy wrapper for the ape S3 plotting method plot.phylo.

The other day a phytools user reported a couple of bugs: (1) when type="fan" the labels plot horizontally instead of radially, as you might expect; and (2) also type="fan" the scale of the plot is all messed up if legend=TRUEandprompt=FALSE (conveniently, the default).

All this havoc resulted just by providing default options for some of things that were well designed to work with certain plot types, but poorly designed for others. I have just posted a fixed version of this function; however users interested in this update should probably just install the latest phytools version from source.

Here is a quick demo:

> require(phytools)
> packageVersion("phytools")
[1] ‘0.4.27’
> tree<-pbtree(n=50,scale=1)
> x<-fastBM(tree)
> plotBranchbyTrait(tree,x,mode="tips",type="fan", legend=0.4)

Note that for this specific type of visualization, I would strongly recommend trying my function contMap. Here's a demo of how that would work:

> obj<-contMap(tree,x,plot=FALSE)
> obj<-setMap(obj,colors=c("red","purple","blue"), space="Lab")
> plot(obj,type="fan",outline=FALSE)

plotBranchbyTrait makes more sense, of course, when we have a trait value associated with each edge that we want to plot on the tree without any interpolation.

Note that there are some interesting differences between these two visualizations that reflect differences between the methods implemented. For instance, the range of trait values in plotBranchbyTrait(...,mode="tips") is narrower than for contMap. This is because plotBranchbyTrait(...,mode="tips"), to get the plotted values for each edge, takes the known or estimated trait values at each end of the edge and averages them. This will invariably contract the range of plotted values when compared to the true range.

↧