New version of fastAnc; new build of phytools

February 14, 2013, 6:11 pm

≫ Next: Function to get the sister(s) of a node or tip

I just posted a new version of the function fastAnc for (relatively) fast ancestral character estimation. The function is previously described here. The main addition to this new version is that now the function (optionally) computes the variance on the ancestral state estimates based on equation (6) of Rohlf (2001), as well as (optionally) 95% confidence intervals on the states. The updated code is here; however, I have also posted a new build of phytools - which can be downloaded here and installed from source.

Note that equation (6) of Rohlf (2001) only gives the relative variance on the ancestral state estimate at the root node. To scale that estimate to our data, we need to multiply by the phylogenetic variance for our continuous trait. This can be computed as the mean square of the contrasts. Once we have the variances, we can compute our 95% CIs on the estimates as the estimates +/- 1.96 × the square root of the variances.

I didn't realize this when I was writing the function, but it turns out to be the case that this update to fastAnc depends on ape>= 3.0-7 (i.e., the newest version of ape as of the date of writing). This is because the options in the ape function for independent contrasts, pic, were expanded in the latest release to include the option of returning the tree with branches scaled to expected variance - which we can conveniently exploit to do the calculation of equation (6) in Rohlf.

I should also point out that the 95% CIs obtained by this function differ in a substantial way from the 95% CIs computed in ace. Specifically, the 95% CIs computed in ace would seem to be too small. We can show this relatively easily by simulation, as follows:

> onCI.ace<-onCI.fastAnc<-vector()
> N<-100
> for(i in 1:1000){
+ tree<-pbtree(n=N)
+ x<-fastBM(tree,internal=TRUE)
+ a<-fastAnc(tree,x[1:N],vars=TRUE,CI=TRUE)
+ onCI.fastAnc[i]<-sum((x[1:tree$Nnode+N]>a$CI95[,1])*(x[1:tree$Nnode+N]< a$CI95[,2]))/tree$Nnode
+ b<-ace(x[1:N],tree,CI=TRUE)
+ onCI.ace[i]<-sum((x[1:tree$Nnode+N]>b$CI95[,1])*(x[1:tree$Nnode+N]< b$CI95[,2]))/tree$Nnode
+ }
There were 24 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In sqrt(diag(solve(h))) : NaNs produced
2: In sqrt(diag(solve(h))) : NaNs produced
3: ...
> # this should be 0.95
> mean(onCI.fastAnc)
[1] 0.9483737
> mean(onCI.ace,na.rm=TRUE)
[1] 0.6738759

This simulation shows that although our 95% CIs computed in fastAnc include the generating values almost exactly 95% of the time (94.8% across 1000 simulations with 99 estimated ancestral states per simulation), ace CIs only include the generating value about 67% of the time.

I'm not exactly sure why this is the case, but my best guess is based on the warnings which tell us that the Hessian is being used to compute the standard errors of the parameter estimates and thus the CIs. This is an asymptotic property of the likelihood surface, and for finite sample this approximation can be quite bad (as we see above).

↧

Function to get the sister(s) of a node or tip

February 15, 2013, 2:18 pm

≫ Next: New version of phytools on CRAN (phytools 0.2-20)

≪ Previous: New version of fastAnc; new build of phytools

I just posted a new utility function, getSisters, that takes as input a tree and a node or tip number or label, and returns the sister node or tip numbers or labels. It has two modes: mode="number", which returns node or tip numbers as an integer or vector; and mode="label" which returns a list with up to two components - one component for node labels (if available) or numbers, and the other component with tip labels.

The code for the function is here, but it also in the most recent build of phytools: phytools 0.2-18.

Here's a quick demo:

> require(phytools)
> tree<-rtree(n=12)
> plotTree(tree,node.numbers=TRUE)