Quantcast
Channel: Phylogenetic Tools for Comparative Biology
Viewing all 801 articles
Browse latest View live

Second video demo of tree.drawer in phytools

$
0
0

I'm getting a kick out of my new function tree.drawerto extract a phylogeny from an image of a plotted tree. Consequently, I couldn't resist posting another demo. This one is based on Figure 1 of Song et al. (2012).

If you'd like to try it yourself, just copy & past the following lines & click away:

library(phytools)
library(jpeg)
source("https://raw.githubusercontent.com/liamrevell/phytools/master/R/tree.drawer.R")
download.file("http://www.pnas.org/content/109/37/14942/F1.large.jpg",
"F1.large.jpg",mode="wb")
eutheria<-tree.drawer("F1.large.jpg")

The video also demonstrates the option GOBACK to remove the most recently added leaf from the tree & go back.

The result is as follows:

plotTree(eutheria)

plot of chunk unnamed-chunk-2

The video demo can also be seen embedded below:


Testing whether the rate of evolution of a continuous trait is different in different trees

$
0
0

I was recently asked by a colleague if I could help develop a method to test if the rate of evolution of a continuous character was the same or different in different trees.

This is closely related to the censored approach to test for changes in the rate of evolution developed by Brian O'Meara & colleagues in 2006. (Note that this should also be related to a method published by Dean Adams in 2013.) Here, though, rather than supply a single tree & specifying rate regimes within that tree, we can supply multiple trees (as an object of class "multiPhylo") and a list of corresponding vectors of trait values for each tree.

First, here is our function:

ratebytree<-function(trees,x,...){
N<-length(trees)
## first, fit multi-rate model
lik.multi<-function(theta,trees,x){
n<-sapply(trees,Ntip)
N<-length(trees)
sig<-theta[1:N]
a<-theta[(N+1):(2*N)]
C<-lapply(trees,vcv)
V<-mapply("*",C,sig)
logL<-0
for(i in 1:N)
logL<-logL-t(x[[i]]-a[i])%*%solve(V[[i]])%*%(x[[i]]-
a[i])/2-n[i]*log(2*pi)/2-determinant(V[[i]])$modulus[1]/2
-logL
}
foo<-function(tree,x){
obj<-phyl.vcv(as.matrix(x),vcv(tree),1)
c(obj$R[1,1],obj$a[1,1])
}
obj<-mapply(foo,trees,x)
p<-as.vector(t(obj))
fit.multi<-optim(p,lik.multi,trees=trees,x=x,method="L-BFGS-B",
lower=c(rep(0,N),rep(-Inf,N)),upper=c(rep(Inf,N),rep(Inf,N)))
## now fit single-rate model
lik.onerate<-function(theta,trees,x){
n<-sapply(trees,Ntip)
N<-length(trees)
sig<-theta[1]
a<-theta[1:N+1]
C<-lapply(trees,vcv)
V<-lapply(C,"*",sig)
logL<-0
for(i in 1:N)
logL<-logL-t(x[[i]]-a[i])%*%solve(V[[i]])%*%(x[[i]]-
a[i])/2-n[i]*log(2*pi)/2-determinant(V[[i]])$modulus[1]/2
-logL
}
p<-c(mean(fit.multi$par[1:N]),fit.multi$par[(N+1):(2*N)])
fit.onerate<-optim(p,lik.onerate,trees=trees,x=x,method="L-BFGS-B",
lower=c(0,rep(-Inf,N)),upper=c(Inf,rep(Inf,N)))
## compare models:
LR<-2*(-fit.multi$value+fit.onerate$value)
P.chisq<-pchisq(LR,df=N-1,lower.tail=FALSE)
obj<-list(
multi.rate.model=list(sig2=fit.multi$par[1:N],
a=fit.multi$par[(N+1):(2*N)],
k=2*N,
logL=-fit.multi$value,
counts=fit.multi$counts,convergence=fit.multi$convergence,
message=fit.multi$message),
common.rate.model=list(sig2=fit.onerate$par[1],
a=fit.onerate$par[1:N+1],
k=N+1,
logL=-fit.onerate$value,
counts=fit.onerate$counts,convergence=fit.onerate$convergence,
message=fit.onerate$message),
N=N,likelihood.ratio=LR,P.chisq=P.chisq)
class(obj)<-"ratebytree"
obj
}

## S3 print method
print.ratebytree<-function(x,...){
if(hasArg(digits)) digits<-list(...)$digits
else digits<-4
N<-x$N
cat("ML common-rate model:\n")
cat(paste("\ts^2\t",paste(paste("a[",1:N,"]",sep=""),collapse="\t")),
"\tk\tlogL\n")
cat(paste("value",round(x$common.rate.model$sig2,digits),
paste(round(x$common.rate.model$a,digits),collapse="\t"),
x$common.rate.model$k,round(x$common.rate.model$logL,digits),
"\n\n",sep="\t"))
cat("ML multi-rate model:\n")
cat(paste("\t",paste(paste("s^2[",1:N,"]",sep=""),collapse="\t"),"\t",
paste(paste("a[",1:N,"]",sep=""),collapse="\t")),
"\tk\tlogL\n")
cat(paste("value",paste(round(x$multi.rate.model$sig2,digits),collapse="\t"),
paste(round(x$multi.rate.model$a,digits),collapse="\t"),
x$multi.rate.model$k,round(x$multi.rate.model$logL,digits),
"\n\n",sep="\t"))
cat(paste("Likelihood ratio:",round(x$likelihood.ratio,digits),"\n"))
cat(paste("P-value (based on X^2):",round(x$P.chisq,digits),"\n\n"))
if(x$multi.rate.model$convergence==0&&x$common.rate.model$convergence==0)
cat("R thinks it has found the ML solution.\n\n")
else cat("One or the other optimization may not have converged.\n\n")
}

Now let's try it with some simulated data.

First, here's a visualization of our data:

ylim<-range(c(x,y,z))
par(mfrow=c(1,3))
phenogram(t1,x,ylim=ylim,spread.cost=c(1,0))
phenogram(t2,y,ylim=ylim,spread.cost=c(1,0))
## Optimizing the positions of the tip labels...
phenogram(t3,z,ylim=ylim,spread.cost=c(1,0))
## Optimizing the positions of the tip labels...

plot of chunk unnamed-chunk-2

It's hard to tell from this whether the rates of each clade are the same or different - especially because the trees all have different total depths.

Now we're ready to fit our models

library(phytools)
fit<-ratebytree(c(t1,t2,t3),list(x,y,z))
fit
## ML common-rate model:
## s^2 a[1] a[2] a[3] k logL
## value 1.4387 0.3648 1.5135 1.3195 4 -189.3164
##
## ML multi-rate model:
## s^2[1] s^2[2] s^2[3] a[1] a[2] a[3] k logL
## value 1.3265 1.9868 0.8394 0.3648 1.5135 1.3195 6 -184.4735
##
## Likelihood ratio: 9.6859
## P-value (based on X^2): 0.0079
##
## R thinks it has found the ML solution.

Note that tree #1 and tree #3 have very comparable estimated rates (actually tree #1 is a little higher), even though the range of values is substantially greater on tree #3. Neat.

For the record, these data were simulated as follows:

t1<-pbtree(n=26)
t2<-pbtree(n=60)
t3<-pbtree(n=50)
x<-fastBM(t1,sig2=1)
y<-fastBM(t2,sig2=2)
z<-fastBM(t3,sig2=1)

Type I error & power of method to compare rates among trees

$
0
0

I just addedthe new function I posted last night to the phytools package.

This function takes as input two or more trees in an object of class "multiPhylo" and two or more continuous trait vectors in a list and proceeds to fit two different models: one in which the rate is constant across all of the trees; and a second, more parameter rich, model in which the rates are permitted to be different among trees. The method then also compares these two fitted models using a likelihood-ratio test.

As I mentioned yesterday, this method is basically equivalent to the censored approach described by O'Meara et al. (2006) and should be closely related to the method of Adams (2012).

In the following, I thought I would explore the statistical properties and power of the method for the case of two trees each with 50 taxa.

First type I error:

## we'll use the following custom function
typeIerror<-function(i,N1,N2,max){
t1<-pbtree(n=N1)
t2<-pbtree(n=N2)
x<-fastBM(t1)
y<-fastBM(t2)
obj<-ratebytree(c(t1,t2),list(x,y))
if(i==1) cat("|.") else cat(".")
if(i==max) cat("|")
if(i%%50==0) cat("\n ")
flush.console()
obj$P.chisq
}
library(phytools)
packageVersion("phytools")
## [1] '0.5.72'
nrep<-500
P<-sapply(1:nrep,typeIerror,N1=50,N2=50,max=nrep)
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
h<-hist(P,breaks=seq(0,1,by=0.1),col="grey",
main="P (under null hypothesis)",freq=FALSE)
abline(h=1,lwd=2,col="red",lty="dashed")

plot of chunk unnamed-chunk-1

Note that this density should be more or less uniform on the interval 0,1 (which is just about what we see), and the type I error rate should be around 0.05:

mean(P<=0.05)
## [1] 0.046

Now let's investigate power for various differences in rate between trees (σ2[2]/σ2[1]):

## we'll use the custom function as follows:
power<-function(sig2,N1,N2){
t1<-pbtree(n=N1)
t2<-pbtree(n=N2)
x<-fastBM(t1)
y<-fastBM(t2,sig2=sig2)
obj<-ratebytree(c(t1,t2),list(x,y))
obj$P.chisq
}
foo<-function(sig2,n){
p<-setNames(replicate(n,power(sig2,50,50)),
paste("rep",1:n))
cat(paste("Done sig2[2]/sig2[1] =",sig2,"\n"))
flush.console()
p
}
sig2<-seq(1,4,by=0.2)
P.power<-sapply(sig2,foo,n=100)
## Done sig2[2]/sig2[1] = 1 
## Done sig2[2]/sig2[1] = 1.2
## Done sig2[2]/sig2[1] = 1.4
## Done sig2[2]/sig2[1] = 1.6
## Done sig2[2]/sig2[1] = 1.8
## Done sig2[2]/sig2[1] = 2
## Done sig2[2]/sig2[1] = 2.2
## Done sig2[2]/sig2[1] = 2.4
## Done sig2[2]/sig2[1] = 2.6
## Done sig2[2]/sig2[1] = 2.8
## Done sig2[2]/sig2[1] = 3
## Done sig2[2]/sig2[1] = 3.2
## Done sig2[2]/sig2[1] = 3.4
## Done sig2[2]/sig2[1] = 3.6
## Done sig2[2]/sig2[1] = 3.8
## Done sig2[2]/sig2[1] = 4
colnames(P.power)<-paste("s2[2]=",sig2,sep="")
plot(sig2,apply(P.power,2,function(x) mean(x<=0.05)),type="b",
xlab=expression(sigma[2]^2/sigma[1]^2),ylab="Power",
ylim=c(0,1),main="Power to detect a difference in rate",
lwd=2)
abline(h=0.05,lty="dashed")
abline(h=1,lty="dashed")
text(x=4,y=0.05,"0.05",pos=3)
text(x=1,y=1,"1.00",pos=1)

plot of chunk unnamed-chunk-3

Again, we hope that the power to detect a difference in rate should increase with σ2212 - that is, the difference in rate between trees. Once again, this is more or less what we see.

Cool.

More user control of optimization in ratebytree

$
0
0

I just made some updatesto the new phytools function ratebytree which compares the rate of continuous character evolution between two or more trees (e.g., 1, 2). The method is essentially the same as the censored approach of O'Meara et al. (2006).

The updates consist of adding the two option arguments: init& trace.

init, the initial conditions of the optimization, needs to be supplied as a list with elements corresponding to the different parameters of the common- and multi-rate models. The names for these elements should be sigc& ac, for the rate & ancestral states for the common-rate model; and sigm& sigm for the initial values of the multi-rate model.

The argument trace is a logical value that tells the function to trace the progress of the optimization.

So, for instance:

library(phytools)
packageVersion("phytools")
## [1] '0.5.73'
t1
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
x
##          A          B          C          D          E          F 
## -0.1397195 -0.1996443 -0.5827639 -0.3396014 -1.9785818 -2.2452530
## G H I J K L
## 1.5195967 1.6068486 0.5919871 1.6156654 1.2425888 -0.5789243
## M N O P Q R
## -1.1352832 -0.5962740 0.8352019 1.1925811 0.1434599 -0.1649340
## S T U V W X
## -0.2620791 -0.8017158 -0.8558019 -1.2876484 1.1789504 -0.4589235
## Y Z
## 0.7054512 0.4988762
t2
## 
## Phylogenetic tree with 40 tips and 39 internal nodes.
##
## Tip labels:
## t4, t8, t9, t1, t25, t26, ...
##
## Rooted; includes branch lengths.
y
##          t4          t8          t9          t1         t25         t26 
## -0.50044880 -0.33976877 -0.22155547 -1.66246432 0.29014161 0.50776830
## t27 t28 t11 t19 t20 t5
## 0.12476121 0.29866183 1.84948814 -0.63734157 0.70780071 -2.29781879
## t13 t14 t12 t33 t34 t22
## 2.54743880 3.51220356 2.56567363 2.06829938 1.85838926 0.82279334
## t39 t40 t2 t3 t31 t32
## 0.22717955 0.20327362 1.87257199 2.97804036 4.62874247 5.10710664
## t35 t36 t7 t23 t24 t6
## 5.36174612 5.16948164 -0.21639416 -1.44359129 0.05346218 -0.93556170
## t37 t38 t17 t18 t10 t15
## -0.02599676 -0.29860201 -0.35398678 -1.42592744 -0.50920027 3.03128846
## t16 t21 t29 t30
## 2.65445859 -0.41292712 -0.69210806 -0.78368040
t3
## 
## Phylogenetic tree with 20 tips and 19 internal nodes.
##
## Tip labels:
## 1, 2, 3, 4, 5, 6, ...
##
## Rooted; includes branch lengths.
z
##           1           2           3           4           5           6 
## -0.01050003 -0.48649665 -0.34190675 0.37566749 1.75862664 1.58447168
## 7 8 9 10 11 12
## 1.67743120 0.28316167 0.22031546 1.07939448 -0.69795686 1.63538094
## 13 14 15 16 17 18
## -0.01273937 0.48345033 1.44212231 -0.09246575 0.05411407 1.53395177
## 19 20
## 1.85086405 4.48540981
fit1<-ratebytree(c(t1,t2,t3),list(x,y,z),trace=TRUE,digits=7)
## 
## Optimizing multi-rate model....
## sig[1] sig[2] sig[3] logL
## 0.5988996 1.5493864 1.0621323 -109.4408658
## 0.5998996 1.5493864 1.0621323 -109.4417174
## 0.5978996 1.5493864 1.0621323 -109.4400477
## 0.5988996 1.5503864 1.0621323 -109.4411925
## 0.5988996 1.5483864 1.0621323 -109.4405471
## 0.5988996 1.5493864 1.0631323 -109.4413405
## 0.5988996 1.5493864 1.0611323 -109.440399
## 0.5988996 1.5493864 1.0621323 -109.4408677
## 0.5988996 1.5493864 1.0621323 -109.4408677
## 0.5988996 1.5493864 1.0621323 -109.4408661
## 0.5988996 1.5493864 1.0621323 -109.4408661
## 0.5988996 1.5493864 1.0621323 -109.4408666
## 0.5988996 1.5493864 1.0621323 -109.4408666
## 0 1.2266815 0.5913863 -748624418.916421
## 0.001 1.2266815 0.5913863 -7502.1407998
## 0 1.2266815 0.5913863 -748624418.916421
## 0 1.2276815 0.5913863 -748624418.912656
## 0 1.2256815 0.5913863 -748624418.920205
## 0 1.2266815 0.5923863 -748624418.904514
## 0 1.2266815 0.5903863 -748624418.928397
## 0 1.2266815 0.5913863 -748624530.012055
## 0 1.2266815 0.5913863 -748624530.012055
## 0 1.2266815 0.5913863 -748624418.916421
## 0 1.2266815 0.5913863 -748624418.916421
## 0 1.2266815 0.5913863 -748624418.916422
## 0 1.2266815 0.5913863 -748624418.916422
## 0.3996009 1.4419983 0.9054799 -110.4784866
## 0.4006009 1.4419983 0.9054799 -110.4642129
## 0.3986009 1.4419983 0.9054799 -110.4929135
## 0.3996009 1.4429983 0.9054799 -110.4778315
## 0.3996009 1.4409983 0.9054799 -110.4791522
## 0.3996009 1.4419983 0.9064799 -110.4772312
## 0.3996009 1.4419983 0.9044799 -110.479757
## 0.3996009 1.4419983 0.9054799 -110.4784894
## 0.3996009 1.4419983 0.9054799 -110.4784894
## 0.3996009 1.4419983 0.9054799 -110.4784869
## 0.3996009 1.4419983 0.9054799 -110.4784869
## 0.3996009 1.4419983 0.9054799 -110.4784875
## 0.3996009 1.4419983 0.9054799 -110.4784875
## 0.5701688 1.5339054 1.0395494 -109.4190706
## 0.5711688 1.5339054 1.0395494 -109.4188632
## 0.5691688 1.5339054 1.0395494 -109.4193188
## 0.5701688 1.5349054 1.0395494 -109.4192724
## 0.5701688 1.5329054 1.0395494 -109.4188771
## 0.5701688 1.5339054 1.0405494 -109.4193574
## 0.5701688 1.5339054 1.0385494 -109.4187925
## 0.5701688 1.5339054 1.0395494 -109.4190726
## 0.5701688 1.5339054 1.0395494 -109.4190726
## 0.5701688 1.5339054 1.0395494 -109.4190709
## 0.5701688 1.5339054 1.0395494 -109.4190709
## 0.5701688 1.5339054 1.0395494 -109.4190714
## 0.5701688 1.5339054 1.0395494 -109.4190714
## 0.571337 1.5234701 1.0245243 -109.4139814
## 0.572337 1.5234701 1.0245243 -109.4138212
## 0.570337 1.5234701 1.0245243 -109.414182
## 0.571337 1.5244701 1.0245243 -109.4140961
## 0.571337 1.5224701 1.0245243 -109.4138752
## 0.571337 1.5234701 1.0255243 -109.4141336
## 0.571337 1.5234701 1.0235243 -109.4138383
## 0.571337 1.5234701 1.0245243 -109.4139833
## 0.571337 1.5234701 1.0245243 -109.4139833
## 0.571337 1.5234701 1.0245243 -109.4139816
## 0.571337 1.5234701 1.0245243 -109.4139816
## 0.571337 1.5234701 1.0245243 -109.4139822
## 0.571337 1.5234701 1.0245243 -109.4139822
## 0.5761806 1.5104399 1.0080879 -109.4117139
## 0.5771806 1.5104399 1.0080879 -109.4117458
## 0.5751806 1.5104399 1.0080879 -109.4117211
## 0.5761806 1.5114399 1.0080879 -109.4117164
## 0.5761806 1.5094399 1.0080879 -109.4117201
## 0.5761806 1.5104399 1.0090879 -109.4117096
## 0.5761806 1.5104399 1.0070879 -109.4117281
## 0.5761806 1.5104399 1.0080879 -109.4117158
## 0.5761806 1.5104399 1.0080879 -109.4117158
## 0.5761806 1.5104399 1.0080879 -109.4117142
## 0.5761806 1.5104399 1.0080879 -109.4117142
## 0.5761806 1.5104399 1.0080879 -109.4117147
## 0.5761806 1.5104399 1.0080879 -109.4117147
## 0.5758686 1.5107979 1.0089505 -109.4117076
## 0.5768686 1.5107979 1.0089505 -109.4117272
## 0.5748686 1.5107979 1.0089505 -109.4117271
## 0.5758686 1.5117979 1.0089505 -109.4117132
## 0.5758686 1.5097979 1.0089505 -109.4117107
## 0.5758686 1.5107979 1.0099505 -109.4117117
## 0.5758686 1.5107979 1.0079505 -109.4117132
## 0.5758686 1.5107979 1.0089505 -109.4117095
## 0.5758686 1.5107979 1.0089505 -109.4117095
## 0.5758686 1.5107979 1.0089505 -109.4117078
## 0.5758686 1.5107979 1.0089505 -109.4117078
## 0.5758686 1.5107979 1.0089505 -109.4117083
## 0.5758686 1.5107979 1.0089505 -109.4117083
## 0.5758622 1.5107246 1.0089937 -109.4117075
## 0.5768622 1.5107246 1.0089937 -109.4117269
## 0.5748622 1.5107246 1.0089937 -109.4117272
## 0.5758622 1.5117246 1.0089937 -109.4117125
## 0.5758622 1.5097246 1.0089937 -109.4117112
## 0.5758622 1.5107246 1.0099937 -109.411712
## 0.5758622 1.5107246 1.0079937 -109.4117127
## 0.5758622 1.5107246 1.0089937 -109.4117094
## 0.5758622 1.5107246 1.0089937 -109.4117094
## 0.5758622 1.5107246 1.0089937 -109.4117077
## 0.5758622 1.5107246 1.0089937 -109.4117077
## 0.5758622 1.5107246 1.0089937 -109.4117083
## 0.5758622 1.5107246 1.0089937 -109.4117083
##
## Optimizing common-rate model....
## sig logL
## 1.0318602 -112.90808
## 1.0328602 -112.9048916
## 1.0308602 -112.9113149
## 1.0318602 -112.908081
## 1.0318602 -112.908081
## 1.0318602 -112.9080803
## 1.0318602 -112.9080803
## 1.0318602 -112.9080807
## 1.0318602 -112.9080807
## 2.0318602 -119.2505125
## 2.0328602 -119.2601002
## 2.0308602 -119.2409257
## 2.0318602 -119.250513
## 2.0318602 -119.250513
## 2.0318602 -119.2505127
## 2.0318602 -119.2505127
## 2.0318602 -119.2505129
## 2.0318602 -119.2505129
## 1.1400755 -112.800431
## 1.1410755 -112.8013959
## 1.1390755 -112.7994976
## 1.1400755 -112.800432
## 1.1400755 -112.800432
## 1.1400755 -112.8004314
## 1.1400755 -112.8004314
## 1.1400755 -112.8004317
## 1.1400755 -112.8004317
## 1.1153898 -112.7868598
## 1.1163898 -112.7870153
## 1.1143898 -112.7867385
## 1.1153898 -112.7868608
## 1.1153898 -112.7868608
## 1.1153898 -112.7868601
## 1.1153898 -112.7868601
## 1.1153898 -112.7868605
## 1.1153898 -112.7868605
## 1.1111754 -112.7865826
## 1.1121754 -112.7865928
## 1.1101754 -112.7866074
## 1.1111754 -112.7865836
## 1.1111754 -112.7865836
## 1.1111754 -112.786583
## 1.1111754 -112.786583
## 1.1111754 -112.7865834
## 1.1111754 -112.7865834
## 1.1113867 -112.7865819
## 1.1123867 -112.7865993
## 1.1103867 -112.7865992
## 1.1113867 -112.7865829
## 1.1113867 -112.7865829
## 1.1113867 -112.7865822
## 1.1113867 -112.7865822
## 1.1113867 -112.7865826
## 1.1113867 -112.7865826
## 1.1113852 -112.7865819
## 1.1123852 -112.7865993
## 1.1103852 -112.7865993
## 1.1113852 -112.7865829
## 1.1113852 -112.7865829
## 1.1113852 -112.7865822
## 1.1113852 -112.7865822
## 1.1113852 -112.7865826
## 1.1113852 -112.7865826
fit1
## ML common-rate model:
## s^2 a[1] a[2] a[3] k logL
## value 1.1114 -0.2344 0.9677 1.7855 4 -112.7866
##
## ML multi-rate model:
## s^2[1] s^2[2] s^2[3] a[1] a[2] a[3] k logL
## value 0.5759 1.5107 1.009 -0.2344 0.9677 1.7855 6 -109.4117
##
## Likelihood ratio: 6.7497
## P-value (based on X^2): 0.0342
##
## R thinks it has found the ML solution.

Now, how about using different values to commence the optimization:

fit2<-ratebytree(c(t1,t2,t3),list(x,y,z),trace=TRUE,digits=7,
init=list(sigc=2,sigm=rep(2,3)))
## 
## Optimizing multi-rate model....
## sig[1] sig[2] sig[3] logL
## 2 2 2 -118.9455945
## 2.001 2 2 -118.9502223
## 1.999 2 2 -118.9409654
## 2 2.001 2 -118.9480425
## 2 1.999 2 -118.9431491
## 2 2 2.001 -118.948072
## 2 2 1.999 -118.9431171
## 2 2 2 -118.9455951
## 2 2 2 -118.9455951
## 2 2 2 -118.9455947
## 2 2 2 -118.9455947
## 2 2 2 -118.9455949
## 2 2 2 -118.9455949
## 1.4226497 1.4226497 1.4226497 -113.9958273
## 1.4236497 1.4226497 1.4226497 -114.0012657
## 1.4216497 1.4226497 1.4226497 -113.9903876
## 1.4226497 1.4236497 1.4226497 -113.9949632
## 1.4226497 1.4216497 1.4226497 -113.9967024
## 1.4226497 1.4226497 1.4236497 -113.997872
## 1.4226497 1.4226497 1.4216497 -113.9937846
## 1.4226497 1.4226497 1.4226497 -113.995828
## 1.4226497 1.4226497 1.4226497 -113.995828
## 1.4226497 1.4226497 1.4226497 -113.9958276
## 1.4226497 1.4226497 1.4226497 -113.9958276
## 1.4226497 1.4226497 1.4226497 -113.9958278
## 1.4226497 1.4226497 1.4226497 -113.9958278
## 0 1.421147 0.7341634 -748624417.333641
## 0.001 1.421147 0.7341634 -7500.5580194
## 0 1.421147 0.7341634 -748624417.333641
## 0 1.422147 0.7341634 -748624417.33276
## 0 1.420147 0.7341634 -748624417.334533
## 0 1.421147 0.7351634 -748624417.328558
## 0 1.421147 0.7331634 -748624417.338757
## 0 1.421147 0.7341634 -748624528.429275
## 0 1.421147 0.7341634 -748624528.429274
## 0 1.421147 0.7341634 -748624417.333641
## 0 1.421147 0.7341634 -748624417.333641
## 0 1.421147 0.7341634 -748624417.333642
## 0 1.421147 0.7341634 -748624417.333642
## 0.948767 1.4221492 1.1933158 -110.963321
## 0.949767 1.4221492 1.1933158 -110.968708
## 0.947767 1.4221492 1.1933158 -110.9579372
## 0.948767 1.4231492 1.1933158 -110.9624514
## 0.948767 1.4211492 1.1933158 -110.9642018
## 0.948767 1.4221492 1.1943158 -110.9646176
## 0.948767 1.4221492 1.1923158 -110.9620293
## 0.948767 1.4221492 1.1933158 -110.9633222
## 0.948767 1.4221492 1.1933158 -110.9633222
## 0.948767 1.4221492 1.1933158 -110.9633213
## 0.948767 1.4221492 1.1933158 -110.9633213
## 0.948767 1.4221492 1.1933158 -110.9633217
## 0.948767 1.4221492 1.1933158 -110.9633217
## 0.4743835 1.4216481 0.9637396 -109.7209006
## 0.4753835 1.4216481 0.9637396 -109.7150794
## 0.4733835 1.4216481 0.9637396 -109.7268043
## 0.4743835 1.4226481 0.9637396 -109.7200254
## 0.4743835 1.4206481 0.9637396 -109.7217869
## 0.4743835 1.4216481 0.9647396 -109.7204189
## 0.4743835 1.4216481 0.9627396 -109.7213941
## 0.4743835 1.4216481 0.9637396 -109.7209029
## 0.4743835 1.4216481 0.9637396 -109.7209029
## 0.4743835 1.4216481 0.9637396 -109.7209009
## 0.4743835 1.4216481 0.9637396 -109.7209009
## 0.4743835 1.4216481 0.9637396 -109.7209014
## 0.4743835 1.4216481 0.9637396 -109.7209014
## 0.7133097 1.4219005 1.0793671 -109.7489584
## 0.7143097 1.4219005 1.0793671 -109.7524779
## 0.7123097 1.4219005 1.0793671 -109.7454546
## 0.7133097 1.4229005 1.0793671 -109.748086
## 0.7133097 1.4209005 1.0793671 -109.7498419
## 0.7133097 1.4219005 1.0803671 -109.7495659
## 0.7133097 1.4219005 1.0783671 -109.7483583
## 0.7133097 1.4219005 1.0793671 -109.7489599
## 0.7133097 1.4219005 1.0793671 -109.7489599
## 0.7133097 1.4219005 1.0793671 -109.7489587
## 0.7133097 1.4219005 1.0793671 -109.7489587
## 0.7133097 1.4219005 1.0793671 -109.7489591
## 0.7133097 1.4219005 1.0793671 -109.7489591
## 0 1.6531789 1.0249276 -748624416.811883
## 0.001 1.6531789 1.0249276 -7500.0362621
## 0 1.6531789 1.0249276 -748624416.811883
## 0 1.6541789 1.0249276 -748624416.812929
## 0 1.6521789 1.0249276 -748624416.810843
## 0 1.6531789 1.0259276 -748624416.812039
## 0 1.6531789 1.0239276 -748624416.811736
## 0 1.6531789 1.0249276 -748624527.907516
## 0 1.6531789 1.0249276 -748624527.907518
## 0 1.6531789 1.0249276 -748624416.811883
## 0 1.6531789 1.0249276 -748624416.811883
## 0 1.6531789 1.0249276 -748624416.811884
## 0 1.6531789 1.0249276 -748624416.811884
## 0.4758741 1.4988849 1.0612462 -109.6770468
## 0.4768741 1.4988849 1.0612462 -109.6713474
## 0.4748741 1.4988849 1.0612462 -109.6828278
## 0.4758741 1.4998849 1.0612462 -109.6769466
## 0.4758741 1.4978849 1.0612462 -109.6771561
## 0.4758741 1.4988849 1.0622462 -109.6775145
## 0.4758741 1.4988849 1.0602462 -109.6765872
## 0.4758741 1.4988849 1.0612462 -109.6770492
## 0.4758741 1.4988849 1.0612462 -109.6770492
## 0.4758741 1.4988849 1.0612462 -109.6770471
## 0.4758741 1.4988849 1.0612462 -109.6770471
## 0.4758741 1.4988849 1.0612462 -109.6770476
## 0.4758741 1.4988849 1.0612462 -109.6770476
## 0.6174096 1.4529944 1.0720481 -109.4758583
## 0.6184096 1.4529944 1.0720481 -109.4772898
## 0.6164096 1.4529944 1.0720481 -109.4744562
## 0.6174096 1.4539944 1.0720481 -109.4753172
## 0.6174096 1.4519944 1.0720481 -109.4764096
## 0.6174096 1.4529944 1.0730481 -109.4764105
## 0.6174096 1.4529944 1.0710481 -109.4753137
## 0.6174096 1.4529944 1.0720481 -109.4758601
## 0.6174096 1.4529944 1.0720481 -109.4758601
## 0.6174096 1.4529944 1.0720481 -109.4758585
## 0.6174096 1.4529944 1.0720481 -109.4758585
## 0.6174096 1.4529944 1.0720481 -109.475859
## 0.6174096 1.4529944 1.0720481 -109.475859
## 0.5490137 1.4881159 1.0371591 -109.4327838
## 0.5500137 1.4881159 1.0371591 -109.4316493
## 0.5480137 1.4881159 1.0371591 -109.4339656
## 0.5490137 1.4891159 1.0371591 -109.4325849
## 0.5490137 1.4871159 1.0371591 -109.432992
## 0.5490137 1.4881159 1.0381591 -109.4330497
## 0.5490137 1.4881159 1.0361591 -109.4325267
## 0.5490137 1.4881159 1.0371591 -109.4327858
## 0.5490137 1.4881159 1.0371591 -109.4327858
## 0.5490137 1.4881159 1.0371591 -109.4327841
## 0.5490137 1.4881159 1.0371591 -109.4327841
## 0.5490137 1.4881159 1.0371591 -109.4327846
## 0.5490137 1.4881159 1.0371591 -109.4327846
## 0.5756508 1.5094148 1.0429111 -109.4171106
## 0.5766508 1.5094148 1.0429111 -109.4171218
## 0.5746508 1.5094148 1.0429111 -109.4171387
## 0.5756508 1.5104148 1.0429111 -109.4171042
## 0.5756508 1.5084148 1.0429111 -109.4171259
## 0.5756508 1.5094148 1.0439111 -109.4174265
## 0.5756508 1.5094148 1.0419111 -109.4168034
## 0.5756508 1.5094148 1.0429111 -109.4171125
## 0.5756508 1.5094148 1.0429111 -109.4171125
## 0.5756508 1.5094148 1.0429111 -109.4171109
## 0.5756508 1.5094148 1.0429111 -109.4171109
## 0.5756508 1.5094148 1.0429111 -109.4171114
## 0.5756508 1.5094148 1.0429111 -109.4171114
## 0.578788 1.4910265 1.0149255 -109.4137608
## 0.579788 1.4910265 1.0149255 -109.4138934
## 0.577788 1.4910265 1.0149255 -109.4136666
## 0.578788 1.4920265 1.0149255 -109.4135888
## 0.578788 1.4900265 1.0149255 -109.413942
## 0.578788 1.4910265 1.0159255 -109.4138229
## 0.578788 1.4910265 1.0139255 -109.4137083
## 0.578788 1.4910265 1.0149255 -109.4137627
## 0.578788 1.4910265 1.0149255 -109.4137627
## 0.578788 1.4910265 1.0149255 -109.4137611
## 0.578788 1.4910265 1.0149255 -109.4137611
## 0.578788 1.4910265 1.0149255 -109.4137616
## 0.578788 1.4910265 1.0149255 -109.4137616
## 0.5759546 1.5111591 1.0083125 -109.4117112
## 0.5769546 1.5111591 1.0083125 -109.4117343
## 0.5749546 1.5111591 1.0083125 -109.4117273
## 0.5759546 1.5121591 1.0083125 -109.41172
## 0.5759546 1.5101591 1.0083125 -109.4117112
## 0.5759546 1.5111591 1.0093125 -109.4117091
## 0.5759546 1.5111591 1.0073125 -109.4117232
## 0.5759546 1.5111591 1.0083125 -109.4117131
## 0.5759546 1.5111591 1.0083125 -109.4117131
## 0.5759546 1.5111591 1.0083125 -109.4117115
## 0.5759546 1.5111591 1.0083125 -109.4117115
## 0.5759546 1.5111591 1.0083125 -109.411712
## 0.5759546 1.5111591 1.0083125 -109.411712
## 0.5758751 1.5106832 1.0090684 -109.4117074
## 0.5768751 1.5106832 1.0090684 -109.4117274
## 0.5748751 1.5106832 1.0090684 -109.4117267
## 0.5758751 1.5116832 1.0090684 -109.4117121
## 0.5758751 1.5096832 1.0090684 -109.4117116
## 0.5758751 1.5106832 1.0100684 -109.4117128
## 0.5758751 1.5106832 1.0080684 -109.4117119
## 0.5758751 1.5106832 1.0090684 -109.4117094
## 0.5758751 1.5106832 1.0090684 -109.4117094
## 0.5758751 1.5106832 1.0090684 -109.4117077
## 0.5758751 1.5106832 1.0090684 -109.4117077
## 0.5758751 1.5106832 1.0090684 -109.4117082
## 0.5758751 1.5106832 1.0090684 -109.4117082
## 0.5758675 1.5106518 1.0090267 -109.4117074
## 0.5768675 1.5106518 1.0090267 -109.4117271
## 0.5748675 1.5106518 1.0090267 -109.411727
## 0.5758675 1.5116518 1.0090267 -109.4117118
## 0.5758675 1.5096518 1.0090267 -109.4117118
## 0.5758675 1.5106518 1.0100267 -109.4117123
## 0.5758675 1.5106518 1.0080267 -109.4117123
## 0.5758675 1.5106518 1.0090267 -109.4117094
## 0.5758675 1.5106518 1.0090267 -109.4117094
## 0.5758675 1.5106518 1.0090267 -109.4117077
## 0.5758675 1.5106518 1.0090267 -109.4117077
## 0.5758675 1.5106518 1.0090267 -109.4117082
## 0.5758675 1.5106518 1.0090267 -109.4117082
##
## Optimizing common-rate model....
## sig logL
## 2 -118.9455945
## 2.001 -118.9551477
## 1.999 -118.9360425
## 2 -118.9455951
## 2 -118.9455951
## 2 -118.9455947
## 2 -118.9455947
## 2 -118.9455949
## 2 -118.9455949
## 1 -113.0350348
## 1.001 -113.0302715
## 0.999 -113.0398507
## 1 -113.0350359
## 1 -113.0350359
## 1 -113.0350352
## 1 -113.0350352
## 1 -113.0350356
## 1 -113.0350356
## 1.3339499 -113.4612924
## 1.3349499 -113.4666788
## 1.3329499 -113.4559222
## 1.3339499 -113.4612933
## 1.3339499 -113.4612933
## 1.3339499 -113.4612928
## 1.3339499 -113.4612928
## 1.3339499 -113.4612931
## 1.3339499 -113.4612931
## 1.1138276 -112.7866855
## 1.1148276 -112.7867874
## 1.1128276 -112.7866181
## 1.1138276 -112.7866865
## 1.1138276 -112.7866865
## 1.1138276 -112.7866858
## 1.1138276 -112.7866858
## 1.1138276 -112.7866862
## 1.1138276 -112.7866862
## 1.1118507 -112.7865857
## 1.1128507 -112.7866192
## 1.1108507 -112.7865868
## 1.1118507 -112.7865867
## 1.1118507 -112.7865867
## 1.1118507 -112.786586
## 1.1118507 -112.786586
## 1.1118507 -112.7865864
## 1.1118507 -112.7865864
## 1.1113832 -112.7865819
## 1.1123832 -112.7865992
## 1.1103832 -112.7865994
## 1.1113832 -112.7865829
## 1.1113832 -112.7865829
## 1.1113832 -112.7865822
## 1.1113832 -112.7865822
## 1.1113832 -112.7865826
## 1.1113832 -112.7865826
## 1.1113852 -112.7865819
## 1.1123852 -112.7865993
## 1.1103852 -112.7865993
## 1.1113852 -112.7865829
## 1.1113852 -112.7865829
## 1.1113852 -112.7865822
## 1.1113852 -112.7865822
## 1.1113852 -112.7865826
## 1.1113852 -112.7865826
fit2
## ML common-rate model:
## s^2 a[1] a[2] a[3] k logL
## value 1.1114 -0.2344 0.9677 1.7855 4 -112.7866
##
## ML multi-rate model:
## s^2[1] s^2[2] s^2[3] a[1] a[2] a[3] k logL
## value 0.5759 1.5107 1.009 -0.2344 0.9677 1.7855 6 -109.4117
##
## Likelihood ratio: 6.7497
## P-value (based on X^2): 0.0342
##
## R thinks it has found the ML solution.

Neat.

Here's a visualization of the evolution of our tree different trees as I showed last time:

ylim<-range(c(x,y,z))
par(mfrow=c(1,3))
phenogram(t1,x,ylim=ylim,spread.cost=c(1,0),ftype="i")
phenogram(t2,y,ylim=ylim,spread.cost=c(1,0),ftype="i")
## Optimizing the positions of the tip labels...
phenogram(t3,z,ylim=ylim,spread.cost=c(1,0),ftype="i")

plot of chunk unnamed-chunk-3

New package for creating phylogenies 'free-hand' in R: physketch

$
0
0

I have started working on a new mini-project, called physketch with the intention of creating a variety of “free-hand” functions that will allow the user to draw trees on the screen in R which then will be turned into objects of class "phylo" for use in other functions or methods.

The inspiration for this comes from the function that I wroteabout a week ago to draw a phylogeny on top of a raster image of a plotted tree, along with some other related methods that I have in mind.

The package - though consisting so far of only this one function (renamed phylo.tracer) can already be downloaded & installed from GitHub using devtools. I added a couple of features to the function. In particular, it can now take either a file name of an image in .jpg format or a raster image as input (consequentially depending on the package jpeg, and it now by default overlays grid lines on the plotted raster image to help the user more easily identify the branching points of common ancestors on the tree.

Here is a quick demo - first of the installation from GitHub:

library(devtools)
install_github("liamrevell/physketch")

Now a demo (similar to what I showed before: 1, 2) of this method with its new features:

library(physketch)
download.file("http://www.nature.com/nature/journal/v477/n7365/images/nature10382-f4.2.jpg",
"nature10382-f4.2.jpg",mode="wb")
mollusca<-phylo.tracer(file="nature10382-f4.2.jpg")

and, of course, the result:

library(phytools)
plotTree(mollusca,lwd=3,fsize=1.2)

plot of chunk unnamed-chunk-3

Note that the tip order changes depending on the the order in which tips are added to the tree - but, rest assured, the topology & branch lengths are correct.

Finally, this example comes from Kocot et al. (2011).

Additional function for interactive drawing of a cladogram or ultrametric phylogram

$
0
0

I just wrote another function designed to permit users to interactively draw phylogenies in the R environment.

This one is based closely on a demo I postedto my blog a couple of weeks ago showing how to draw a tree using phytools::bind.tip. This one uses bind.tip in a couple of different ways - either interactively (as in the case for method="cladogram" or non-interactively, but via the interactive phytools (mostly internal - although now exported to the namespace) function get.treepos.

The following is a quick demo:

library(phytools)

tips<-c("lemur","robin","whale","coelecanth","bat","cow",
"goldfish","pig","iguana","human")
tips
##  [1] "lemur"      "robin"      "whale"      "coelecanth" "bat"       
## [6] "cow" "goldfish" "pig" "iguana" "human"
outgroup<-"shark"
outgroup
## [1] "shark"
draw.ultrametric<-function(ingroup,outgroup=NULL,depth=1.0,
method=c("phylogram","cladogram")){
method<-method[1]
if(is.null(outgroup)) out<-"OUTGROUP"
else out<-outgroup
tree<-pbtree(n=2,tip.label=c(ingroup[1],out),scale=depth)
if(method=="cladogram"){
for(i in 2:length(ingroup))
tree<-bind.tip(tree,ingroup[i],interactive=TRUE)
tree$edge.length<-NULL
} else if(method=="phylogram"){
dev.hold()
plotTree(tree,mar=c(0.1,0.1,3.1,0.1))
v<-seq(0,depth,by=depth/10)
axis(3,at=v)
abline(v=v,lty="dashed",col=make.transparent("grey",0.7))
dev.flush()
cat(paste("Click where you would like to bind the tip \"",
ingroup[2],"\"\n",sep=""))
flush.console()
for(i in 2:length(ingroup)){
obj<-get.treepos(message=FALSE)
tree<-bind.tip(tree,ingroup[i],where=obj$where,
pos=obj$pos)
dev.hold()
plotTree(tree,mar=c(0.1,0.1,3.1,0.1))
axis(3,at=v)
abline(v=v,lty="dashed",col=make.transparent("grey",0.7))
dev.flush()
if(i<length(ingroup))
cat(paste("Click where you would like to bind the tip \"",
ingroup[i+1],"\"\n",sep=""))
flush.console()
}
}
tree
}
vertebrates<-draw.ultrametric(tips,outgroup)
plotTree(vertebrates)

plot of chunk unnamed-chunk-3

[The above example is based on one I use in undergrad classes in which I try to get them to come up with a 'phylogeny of vertebrate' from the following slide:

from which they are supposed to draw a tree.]

This function will most likely go in the physketch package.

That's all.

Plotting up- & down-facing phylograms in phytools::plotSimmap

$
0
0

I just pushed a significant updateto the workhorse phytools plotting function plotSimmap to permit plotting in both up- and down-facing square phylograms, in addition to the right & left phylograms and the circular fan trees already available.

When I call plotSimmap a “workhorse,” what I mean is that it is used internally by a whole bunch of other functions & methods in the package. For instance, in addition to the S3 plot method for "simmap" and "multiSimmap" object classes, it is also used by bind.tip (in interactive mode), contMap (& plot.contMap), densityMap (& plot.densityMap), densityTree, dotTree (for only one character), fancyTree (for various methods), ltt, plot.describe.simmap, plotTree, plotTree.wBars (along with both plotTree.barplot, and plotTree.boxplot), reroot (for interactive mode), and treeSlice (in prompt=TRUE mode), among other things. Adding the up- & down-facing directions to this method will eventually allow me to add this capability to many of these functions as well.

plotSimmap is also used inside the new functions of the physketch package, such as in the phylo.tracer function to draw a "phylo" object on top of the image of a plotted tree. When I extend the direction argument to this function it will be possible to trace a phylogeny plotted in any direction.

Here's a quick demo of up- & down-facing trees:

library(phytools)
packageVersion("phytools")
## [1] '0.5.75'
plotTree(tree,direction="upwards",mar=c(0.1,4.1,0.1,0.1))
axis(side=2,at=seq(0,100,by=20))
title(ylab="time since the root")

plot of chunk unnamed-chunk-1

We can flip the direction of the axis - that is, plotting time from the present instead of time since the root - by plotting our tree downwards, but then switch ylim{1]& ylim{2].

For instance:

obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
ylim<-obj$y.lim[2:1]-(obj$y.lim[2]-max(nodeHeights(tree)))
plotTree(tree,direction="downwards",mar=c(0.1,4.1,0.1,0.1),ylim=ylim)
axis(side=2,at=seq(0,100,by=20))
title(ylab="time since the present")

plot of chunk unnamed-chunk-2

One (surprising?) challenge with tree plotting is ensuring that there is enough, but not too much, space left for tip labels. The reason this is such a pain is because we don't know how much space the labels will take up in the units of our tree until we've already plotted the tree! Let's make one crazy long label to see if I got it right this time:

tree$tip.label[1]<-"The quick brown fox jumped over the lazy dog"
plotTree(tree,direction="upwards")

plot of chunk unnamed-chunk-3

There are still some bugs to be worked it before I can add this option to all the other functions listed above - so please be patient.

That's it.

Plotting up- and down-facing "contMap"&"densityMap" trees

$
0
0

Yesterday, I posted on plotting upward & downward facing trees with plotSimmap (and the wrapping function plotTree).

The reason I programmed this is to eventually extend this functionality to the methods that use plotSimmap internally - a long listthat includes such functions as plot.contMap (& contMap) and plot.densityMap (& densityMap). These functions seem to be quite popularso I thought I'd do them first.

The biggest pain in the butt for these methods was not getting the phylogeny oriented correctly - after yesterday's phytools update, that part was easy. The hard part was getting the color-bar legend correctly positioned & sized. The reason this turned out to be challenging was because our plotting region is not square - meaning that x& y dimensions can have different scales. When our legend & branch lengths point in the same direction (right-to-left or left-to-right) we have nothing to worry about; however if we want to plot our tree upward or downward, but our legend (for the vertical branches) horizontally, then we start to have problems. To make this happen, we have to take advantage of par("pin") which gives us the relative dimensions of our plotting area in x& y - but in inches, not in our plotting units (these are given by par("usr"), by the way).

The updates can be seen on GitHub here.

Let's try a demo. Here I have simulated a simple tree, but added one longish tip label to make sure that the sizing of our plotting area to accomodate tip labels is working properly. (As mentioned in an earlier post, this is one of the other challenging thing about plotting phylogenies.)

library(phytools)
packageVersion("phytools")
## [1] '0.5.76'
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## A longish species name, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
x
## A longish species name                      B                      C 
## -1.3444974 -2.7182556 -3.6415576
## D E F
## -2.7367692 -2.9094996 1.0659075
## G H I
## 0.4134942 0.6560451 0.9139865
## J K L
## 1.3646047 1.3643494 0.4296533
## M N O
## 0.9882680 0.8843809 0.7174877
## P Q R
## 1.0307320 1.3304640 2.8291197
## S T U
## 0.6508031 0.1046013 0.7843597
## V W X
## 0.8382973 0.4644528 0.5338497
## Y Z
## 2.5244863 2.2260855
obj<-contMap(tree,x,plot=FALSE)
obj
## Object of class "contMap" containing:
##
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
##
## (2) A mapped continuous trait on the range (-3.641558, 2.82912).
plot(obj,direction="upwards")

plot of chunk unnamed-chunk-1

Next, using densityMap:

trees<-make.simmap(tree,y,nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
##
## Q =
## a b
## a -0.9922819 0.9922819
## b 0.9922819 -0.9922819
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## a b
## 0.5 0.5
## Done.
obj<-densityMap(trees,plot=FALSE)
## sorry - this might take a while; please be patient
obj
## Object of class "densityMap" containing:
##
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
##
## (2) The mapped posterior density of a discrete binary character with states (a, b).
plot(obj,direction="downwards",lwd=5,outline=TRUE)

plot of chunk unnamed-chunk-2

obj<-setMap(obj,c("white","black"))
plot(obj,direction="upwards",lwd=5,outline=TRUE)

plot of chunk unnamed-chunk-2

Neat.

The data for this exercise were simulated as follows:

tree<-pbtree(n=26,tip.label=LETTERS)
tree$tip.label[1]<-"A longish species name"
x<-fastBM(tree)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
y<-sim.history(tree,Q)$states

Interactive node labeling using phytools

$
0
0

When it comes to labeling internal nodes on a phylogenetic tree in R, the function nodelabels in the ape package can pretty much do it all. Nonetheless, when I was contacted recently by a colleague I realized there was space for some additional functionality - specifically, in terms of allowing the user to interact with the plotting device to determine the location of said labels - something that is not currently possible using ape::nodelabels (so far as I can tell).

The code below consists of two different functions. The first returns the index of the closest node on the plotting device for a currently plotted phylogeny. The second writes a label to that node, with a few different options from nodelabels. Note that getnodecould easily be combined with ape::nodelabels to take advantage of all the functionality of nodelabels but in an interactive context.

getnode<-function(...){
if(hasArg(env)) env<-list(...)$env
else env<-get("last_plot.phylo",envir=.PlotPhyloEnv)
xy<-unlist(locator(n=1))
points(xy[1],xy[2])
d<-sqrt((xy[1]-env$xx)^2+(xy[2]-env$yy)^2)
ii<-which(d==min(d))[1]
ii
}

labelnodes<-function(text,node=NULL,interactive=TRUE,
shape=c("circle","ellipse","rect"),...){
shape<-shape[1]
if(hasArg(circle.exp)) circle.exp<-list(...)$circle.exp
else circle.exp<-1.3
if(hasArg(rect.exp)) rect.exp<-list(...)$rect.exp
else rect.exp<-1.6
if(hasArg(cex)) cex<-list(...)$cex
else cex<-1
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-cex*strheight("A")
w<-cex*strwidth(text)
rad<-circle.exp*h*diff(par()$usr[1:2])/diff(par()$usr[3:4])
if(is.null(node)){
if(!interactive){
cat("No nodes provided. Setting interactive mode to TRUE.\n")
interactive<-TRUE
}
node<-vector(length=length(text))
}
for(i in 1:length(text)){
if(interactive){
cat(paste("Click on the node you would like to label ",
text[i],".\n",sep=""))
flush.console()
ii<-getnode(env=obj)
node[i]<-ii
} else ii<-node[i]
if(shape=="circle")
draw.circle(obj$xx[ii],obj$yy[ii],rad,col="white")
else if(shape=="ellipse")
draw.ellipse(obj$xx[ii],obj$yy[ii],0.8*w[i],h,
col="white")
else if(shape=="rect")
rect(xleft=obj$xx[ii]-0.5*rect.exp*w[i],
ybottom=obj$yy[ii]-0.5*rect.exp*h,
xright=obj$xx[ii]+0.5*rect.exp*w[i],
ytop=obj$yy[ii]+0.5*rect.exp*h,col="white",
ljoin=1)
text(obj$xx[ii],obj$yy[ii],label=text[i],cex=cex)
}
invisible(node)
}
library(phytools)
library(plotrix)
text
vertebrates<-read.tree(text=text)
plotTree(vertebrates)
labels<-c("Cartilaginous fish",
"Ray-finned fish",
"Lobe-finned fish",
"Anurans",
"Reptiles (& birds)",
"Birds",
"Mammals",
"Eutherians")
labels
nodes<-labelnodes(text=labels,shape="ellipse",cex=0.8)

(Click for full screen version.)

plotTree(vertebrates,fsize=0.8)
labelnodes(node=nodes,text=labels,shape="ellipse",cex=0.7,interactive=FALSE)

plot of chunk unnamed-chunk-3

That's it.

More on labeling nodes using ape::nodelabels & phytools

$
0
0

The colleague question that inspired my recent poston interactively adding node labels to a tree was actually must simpler & is as follows:

“A student of mine has a 250 taxon tree that we want to number the nodes for on the figure, starting with "1” (preferably the root, but we can work with most any logical system) through 249. Is there an easy way in phytools?“

This is indeed pretty straightforward. I'm going to work with a small tree of 50 taxa - just to not overwhelm the size of plot window I can create on my blog - but the principle is identical.

First of all - we can easily do it with ape::nodelabels.

library(ape)
tree
## 
## Phylogenetic tree with 50 tips and 49 internal nodes.
##
## Tip labels:
## t8, t9, t7, t3, t17, t27, ...
##
## Rooted; includes branch lengths.
plot(tree,no.margin=TRUE,edge.width=2,cex=0.7)
nodelabels(text=1:tree$Nnode,node=1:tree$Nnode+Ntip(tree))

plot of chunk unnamed-chunk-1

There are many different options with nodelabels. For instance, we can plot our tree in "fan" style, and our node labels within circles:

plot(tree,no.margin=TRUE,edge.width=2,type="fan",cex=0.9)
nodelabels(text=1:tree$Nnode,node=1:tree$Nnode+Ntip(tree),frame="circle")

plot of chunk unnamed-chunk-2

We don't need to use frames around our labels. We can instead offset them from the plotted edges:

plot(tree,no.margin=TRUE,edge.width=2,cex=0.7)
nodelabels(text=1:tree$Nnode,node=1:tree$Nnode+Ntip(tree),
frame="none",adj=c(1.1,-0.4))

plot of chunk unnamed-chunk-3

Equally well, we could employ the new function that I have just added to the phytools package. For instance:

library(phytools)
packageVersion("phytools")
## [1] '0.5.77'
plotTree(tree,type="fan",fsize=0.9,ftype="i")
labelnodes(text=1:tree$Nnode,node=1:tree$Nnode+Ntip(tree),
interactive=FALSE,circle.exp=0.9,cex=0.8)

plot of chunk unnamed-chunk-4

I like that.

Testing hypotheses about Pagel's λ in phyl.RMA

$
0
0

Yesterday I received the following question by email:

“I submitted a paper for publication using your phylogenetic RMA code [phyl.RMA(x, y, tree, method="lambda")] and received a reviewer comment that they would like me to test whether my λ values are significantly different from zero or not. I am no R guru and have scoured the internet R forums looking for help on this topic to no avail. Do you have suggestions how to modify the code to test if λ is significantly different from 0 or not?”

This turns out to be quite straightforward now. There is no logLik S3 method for the "phyl.RMA" object class; however the likelihood is printed & it is straightforward to pull it out. The following example illustrates how we can go about doing this.

library(phytools)

Here's our data & tree:

tree
## 
## Phylogenetic tree with 30 tips and 29 internal nodes.
##
## Tip labels:
## t20, t21, t18, t19, t22, t23, ...
##
## Rooted; includes branch lengths.
x
##         t20         t21         t18         t19         t22         t23 
## 1.74081803 0.33676433 -2.50478160 0.89857946 1.76875257 3.08699370
## t11 t6 t3 t14 t15 t12
## 0.15106911 0.66324229 -3.41883563 -0.04319447 1.68108703 0.64884875
## t4 t13 t27 t28 t10 t16
## -0.44859035 4.86755607 5.47266102 4.87278387 1.89486242 0.77032544
## t17 t7 t8 t2 t1 t29
## -1.28424131 0.35679219 0.54643104 1.66794272 0.67532856 0.69751983
## t30 t26 t9 t5 t24 t25
## 2.47659363 1.85573177 0.99397157 1.88833299 -3.51120262 0.00644626
y
##        t20        t21        t18        t19        t22        t23 
## -1.4604984 -1.1234784 -1.8955693 -1.1046771 0.5031821 0.7232508
## t11 t6 t3 t14 t15 t12
## -1.3182090 0.7771434 -5.0391041 -0.7060600 0.1528692 -2.3452736
## t4 t13 t27 t28 t10 t16
## -0.4259814 5.0943036 4.5370231 3.8153191 2.4929304 1.4140231
## t17 t7 t8 t2 t1 t29
## -0.2126191 -1.3148316 0.1247963 1.7954273 1.6049011 -0.5283464
## t30 t26 t9 t5 t24 t25
## 0.6998745 0.8378205 0.7286870 1.0226437 -3.5451315 0.4713685

Now let's fit our phylogenetic RMA:

## ML value of lambda
fit.ml<-phyl.RMA(x,y,tree,method="lambda")
plot(fit.ml)

plot of chunk unnamed-chunk-3

fit.ml
## 
## Coefficients:
## (Intercept) x
## -0.5950063 0.9436907
##
## VCV matrix:
## x y
## x 1.606109 1.267578
## y 1.267578 1.430324
##
## Model for the covariance structure of the error is "lambda"
##
## Estimates (or set values):
## lambda log(L)
## 0.6244482 -104.6795675
##
## Hypothesis test based on Clarke (1980; Biometrika):
## r2 T df P
## 0.699423 0.559378 22.745176 0.581371
##
## Note that the null hypothesis test is h0 = 1
## fixed lambda=0
fit.lambda0<-phyl.RMA(x,y,tree,method="lambda",
lambda=0,fixed=TRUE)
plot(fit.lambda0)

plot of chunk unnamed-chunk-3

fit.lambda0
## 
## Coefficients:
## (Intercept) x
## -0.8051448 1.0389307
##
## VCV matrix:
## x y
## x 1.300902 1.157591
## y 1.157591 1.404163
##
## Model for the covariance structure of the error is "lambda"
##
## Estimates (or set values):
## lambda log(L)
## 0.0000 -109.1883
##
## Hypothesis test based on Clarke (1980; Biometrika):
## r2 T df P
## 0.733581 0.391534 22.485945 0.699089
##
## Note that the null hypothesis test is h0 = 1

We can see that the ML(λ) model has a higher log-likelihood (it must have an equal or better likelihood since λ=0 is a special case); however this comparison does not tell us which model to prefer.

One option for picking the better model is through the use of a likelihood-ratio test. This is conducted as follows:

LR<-2*(fit.ml$logL-fit.lambda0$logL)
P.chisq<-pchisq(LR,df=1,lower.tail=FALSE)
P.chisq
## [1] 0.002674072

This suggests that our more parameter-rich model fits significantly better than our simpler λ=0 model.

We can also similarly compare our ML(λ) model to a simpler Brownian model (equivalent to fixing λ=1).

fit.BM<-phyl.RMA(x,y,tree)
fit.BM
## 
## Coefficients:
## (Intercept) x
## -0.5049189 0.8606773
##
## VCV matrix:
## x y
## x 7.754927 5.955602
## y 5.955602 5.744582
##
## Model for the covariance structure of the error is "BM"
##
## Estimates (or set values):
## lambda log(L)
## 1.0000 -117.5431
##
## Hypothesis test based on Clarke (1980; Biometrika):
## r2 T df P
## 0.796187 1.758561 22.027273 0.092539
##
## Note that the null hypothesis test is h0 = 1
LR.bm<-2*(fit.ml$logL-fit.BM$logL)
P.bm<-pchisq(LR.bm,df=1,lower.tail=FALSE)
P.bm
## [1] 3.932632e-07

This shows that even though our ML value of λ is closer to 1 than 0, the former is much more strongly rejected than the latter. This can be seen even more clearly if we plot our likelihood surface for λ:

lambda<-seq(0,1,by=0.01)
logL<-sapply(lambda,function(l,x,y,tree) phyl.RMA(x,y,tree,method="lambda",
lambda=l,fixed=TRUE)$logL,x=x,y=y,tree=tree)
plot(lambda,logL,type="l",ylab="log(L)",xlab=expression(lambda))
abline(v=fit.ml$lambda,lty="dashed",col="grey",lwd=2)

plot of chunk unnamed-chunk-6

That's it.

BTW, the data & tree for this example were simulated as follows:

library(phytools)
tree<-pbtree(n=30)
xy<-fastBM(phytools:::lambdaTree(tree,0.6))
x<-xy+fastBM(phytools:::lambdaTree(tree,0.6),sig2=0.4)
y<-xy+fastBM(phytools:::lambdaTree(tree,0.6),sig2=0.4)

Adding colorful error bars to internal node values using phytools::contMap

$
0
0

Today a R-sig-phylo member posted the following question:

“I was wondering about plotting the output of an ancestral state 'reconstruction' of a continuous trait while incorporating at least some of the uncertainty around the estimates.

One approach I thought of was to map the ASR onto a tree in a standard way, then at each node have essentially a mini-legend that is of a length reflecting the width of the confidence interval of the estimate at that node, and is coloured on the same colour-scale as the overall tree legend. For instance, if the colour scheme for the tree goes from blue through yellow to red as the value increases, then a node with a relatively precise and high estimate will have a short bar only ranging through different shades of red, whereas a highly uncertain low estimate will have a wider bar coloured from (say) dark blue to orange/light red. I hope that description makes sense.”

This is cannot be done automatically in phytools, but it is relatively straightforward to accomplish. For the record, I thought this would look terrible - but it actually looks better than I expected.

One key trick is that if we want our 'error bars' to show the 95% CIs for our ancestral values, we have to first find the total range of all such intervals and then use those to specify the range (argument lims) in our contMap calculation:

## load our package
library(phytools)
## here's our tree & data
tree
## 
## Phylogenetic tree with 12 tips and 11 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
x
##          A          B          C          D          E          F 
## 1.6716660 0.8798575 1.5227959 1.1046503 1.3339632 -0.9191693
## G H I J K L
## -0.9052098 -0.4977580 -0.6099311 -1.3487360 -1.6726186 -0.5795939
## get ancestral states with CIs
aa<-fastAnc(tree,x,CI=TRUE)
xlim<-range(aa$CI95) ## our range for the "contMap" object
## compute our "contMap" object
obj<-contMap(tree,x,lims=xlim,plot=FALSE)
obj
## Object of class "contMap" containing:
##
## (1) A phylogenetic tree with 12 tips and 11 internal nodes.
##
## (2) A mapped continuous trait on the range (-1.657191, 2.200442).
plot(obj,xlim=c(-0.12,2.2)) ## to accommodate the root node label
d<-diff(obj$lims)
n<-length(obj$cols)-1
lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-max(nodeHeights(tree))
for(i in 1:tree$Nnode){
ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
cols<-obj$cols[ii:jj]
add.color.bar(leg=0.05*h,cols=cols,prompt=FALSE,
x=lastPP$xx[i+Ntip(tree)]-0.025*h,
y=lastPP$yy[i+Ntip(tree)],title="",subtitle="",lims=NULL,
lwd=14)
}

plot of chunk unnamed-chunk-1

That's it. If we want to change our color scheme, that is also no problem:

obj<-setMap(obj,colors=c("black","white"))
plot(obj,xlim=c(-0.12,2.2))
d<-diff(obj$lims)
n<-length(obj$cols)-1
lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-max(nodeHeights(tree))
for(i in 1:tree$Nnode){
ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
cols<-obj$cols[ii:jj]
add.color.bar(leg=0.05*h,cols=cols,prompt=FALSE,
x=lastPP$xx[i+Ntip(tree)]-0.025*h,
y=lastPP$yy[i+Ntip(tree)],title="",subtitle="",lims=NULL,
lwd=14)
}

plot of chunk unnamed-chunk-2

Neat.

Obviously, any of the parameters of the error bars - their length (here set to 5% of the total tree length), width, etc., can easily be changed by modifying the code above.

The tree & data for this exercise were simulated as follows:

tree<-pbtree(n=12,tip.label=LETTERS[1:12])
x<-fastBM(tree)

That's all.

More on adding error bars to a "contMap" plot

$
0
0

Earlier today I responded to a question about plotting color error bars at the internal nodes of a plotted "contMap" object. The original poster then also added that he would like to scale the length of the error bars by the degree of uncertainty in each estimated node state. This is fairly straightforward. Using the same tree & data we employed earlier, I would go about doing this as follows:

library(phytools)
aa<-fastAnc(tree,x,CI=TRUE)
xlim<-range(aa$CI95)
obj<-contMap(tree,x,lims=xlim,plot=FALSE)
## for fun, let's change the color scheme:
obj<-setMap(obj,colors=c("blue","purple","red"))
plot(obj,xlim=c(-0.12,2.2))
d<-diff(obj$lims)
v<-aa$CI95[,2]-aa$CI95[,1]
v<-v/max(v)
n<-length(obj$cols)-1
lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-max(nodeHeights(tree))
for(i in 1:tree$Nnode){
ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
cols<-obj$cols[ii:jj]
add.color.bar(leg=0.1*h*v[i],cols=cols,prompt=FALSE,
x=lastPP$xx[i+Ntip(tree)]-0.05*h*v[i],
y=lastPP$yy[i+Ntip(tree)],title="",subtitle="",lims=NULL,
lwd=14)
}

plot of chunk unnamed-chunk-1

A second poster asked if it were possible to show multiple CIs in this way on a single tree. Of course - anything is possible, so why not give this a try too?

Here, I'll imagine I have the same tree as before, but three continuous characters in the columns of a data matrix X as follows:

X
##           v1          v2         v3
## A -2.7149209 -1.60824200 -0.1530299
## B -2.6918371 -0.56868787 -0.4045761
## C -2.2214286 0.74691204 -0.7665039
## D -0.7471508 -0.07388221 -1.8163244
## E -0.2516396 -1.59142223 -0.7799235
## F -1.0164357 -0.77471225 -1.2468231
## G -0.2909680 -1.22294794 -1.2467781
## H -0.5319452 -0.10286165 -1.3781947
## I 1.0340376 -0.95934999 -0.9955973
## J 0.4739410 -0.49495658 -1.2955523
## K 1.1069054 -0.49403236 -1.1990746
## L -0.1095284 0.58439209 -1.3297114

First, we need to repeat our analyses of above - but now across all three characters:

AA<-apply(X,2,fastAnc,tree=tree,CI=TRUE)
lims<-range(sapply(AA,function(x) range(x$CI95)))
objs<-apply(X,2,contMap,tree=tree,lims=lims,plot=FALSE)
objs
## $v1
## Object of class "contMap" containing:
##
## (1) A phylogenetic tree with 12 tips and 11 internal nodes.
##
## (2) A mapped continuous trait on the range (-3.093201, 1.302283).
##
##
## $v2
## Object of class "contMap" containing:
##
## (1) A phylogenetic tree with 12 tips and 11 internal nodes.
##
## (2) A mapped continuous trait on the range (-3.093201, 1.302283).
##
##
## $v3
## Object of class "contMap" containing:
##
## (1) A phylogenetic tree with 12 tips and 11 internal nodes.
##
## (2) A mapped continuous trait on the range (-3.093201, 1.302283).

Let's set our "contMap" objects to grey scale:

objs<-lapply(objs,setMap,colors=c("white","black"))

For reasons that will become clear below, I'll also find the biggest CI on any node state across all three of the characters:

D<-sapply(AA,function(x) x$CI95[,2]-x$CI95[,1])
max.v<-max(D)

Now, since we have three characters, maybe it makes more sense to plot the original tree & then just the CIs at nodes - along with our legend:

plotTree(tree,lwd=3,xlim=c(-0.12,2.16),ylim=c(-0.8,12))
add.color.bar(leg=0.5*max(nodeHeights(tree)),cols=objs[[1]]$cols,title="trait value",
lims=objs[[1]]$lims,prompt=FALSE,x=0,y=-0.4)
foo<-function(aa,obj,tree,y.shift,max.v){
d<-diff(obj$lims)
n<-length(obj$cols)-1
lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-max(nodeHeights(tree))
v<-aa$CI95[,2]-aa$CI95[,1]
v<-v/max.v
for(i in 1:tree$Nnode){
ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
cols<-obj$cols[ii:jj]
add.color.bar(leg=0.1*h*v[i],cols=cols,prompt=FALSE,
x=lastPP$xx[i+Ntip(tree)]-0.05*h*v[i],
y=lastPP$yy[i+Ntip(tree)]+y.shift,
title="",subtitle="",lims=NULL,lwd=10)
}
}
for(i in 1:length(objs)) foo(AA[[i]],objs[[i]],tree,y.shift=(-i+2)/3,max.v=max.v)

plot of chunk unnamed-chunk-6

That's it.

Function to add error bars to a contMap plot

$
0
0

I just wrote a function to add error bars to a "contMap" plot, as discussed today on R-sig-phyloand my blog.

The function looks as follows:

errorbar.contMap<-function(obj,...){
if(hasArg(x)) x<-list(...)$x
else x<-setNames(sapply(1:Ntip(obj$tree),function(x,obj){
ii<-which(obj$tree$edge[,2]==x)
ss<-names(obj$tree$maps[[ii]][length(obj$tree$maps[[ii]])])
obj$lims[1]+as.numeric(ss)/(length(obj$cols)-1)*diff(obj$lims)
},obj=obj),obj$tree$tip.label)
if(hasArg(scale.by.ci)) scale.by.ci<-list(...)$scale.by.ci
else scale.by.ci<-FALSE
tree<-obj$tree
aa<-fastAnc(tree,x,CI=TRUE)
xlim<-range(aa$CI95)
if(xlim[2]>obj$lims[2]||xlim[1]<obj$lims[1]){
cat(paste(" -----\n The range of the contMap object, presently (",
round(obj$lims[1],4),",",
round(obj$lims[2],4),
"), should be equal to\n or greater than the range of the CIs on ancestral states: (",
round(xlim[1],4),",",round(xlim[2],4),").\n",sep=""))
cat(paste(" To ensure that your error bars are correctly plotted, please recompute your\n",
" contMap object and increase lims.\n -----\n",sep=""))
}
d<-diff(obj$lims)
if(scale.by.ci){
v<-aa$CI95[,2]-aa$CI95[,1]
v<-v/max(v)
} else v<-rep(0.5,tree$Nnode)
n<-length(obj$cols)-1
lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-max(nodeHeights(tree))
for(i in 1:tree$Nnode){
ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
cols<-obj$cols[ii:jj]
add.color.bar(leg=0.1*h*v[i],cols=cols,prompt=FALSE,
x=lastPP$xx[i+Ntip(tree)]-0.05*h*v[i],
y=lastPP$yy[i+Ntip(tree)],title="",subtitle="",lims=NULL,
lwd=14)
}
}

One element of the function

x<-setNames(sapply(1:Ntip(obj$tree),function(x,obj){
ii<-which(obj$tree$edge[,2]==x)
ss<-names(obj$tree$maps[[ii]][length(obj$tree$maps[[ii]])])
obj$lims[1]+as.numeric(ss)/(length(obj$cols)-1)*diff(obj$lims)
},obj=obj),obj$tree$tip.label)

is a complicated piece of code that is only used if the optional argument x (the same as x in contMap is not supplied. What this line does is pull the values of x instead from the mapped object.

OK, let's try it:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
x
##           A           B           C           D           E           F 
## 2.40577324 -2.09590799 -0.03881523 -0.32857686 -1.71533031 -1.07523748
## G H I J K L
## -0.99890853 3.07436198 4.00842163 -0.27299915 2.41135389 2.61494639
## M N O P Q R
## 2.97710788 3.56872630 2.96244691 1.92234400 3.85672218 5.70984852
## S T U V W X
## 0.15119803 0.50825014 3.26386662 2.21301397 1.52734360 1.93244087
## Y Z
## 1.51311417 4.41672993
y
##           A           B           C           D           E           F 
## -0.52075890 -3.22896754 -1.99922219 -2.99064026 -3.85137945 0.92981725
## G H I J K L
## -1.77583525 -5.53973459 -1.39568877 -6.03400572 -5.50545828 -2.04112670
## M N O P Q R
## -3.45201728 -2.87475079 -4.97170923 -1.41198149 -1.58924448 0.07361102
## S T U V W X
## -0.37121255 1.66386064 0.38763598 -3.65979548 -2.99468975 -1.57039289
## Y Z
## -0.27934165 0.97554201

First with x:

obj<-contMap(tree,x,plot=FALSE)
plot(obj,xlim=c(-0.5,10.5))
errorbar.contMap(obj,scale.by.ci=TRUE)

plot of chunk unnamed-chunk-4

Now with y:

obj<-contMap(tree,y,plot=FALSE)
plot(obj,xlim=c(-0.5,10.5))
errorbar.contMap(obj,scale.by.ci=TRUE)

plot of chunk unnamed-chunk-5

##   -----
## The range of the contMap object, presently (-6.034,1.6639), should be equal to
## or greater than the range of the CIs on ancestral states: (-5.9555,2.7099).
## To ensure that your error bars are correctly plotted, please recompute your
## contMap object and increase lims.
## -----

This tells us that the range of the CIs on the ancestral nodes for y exceeds the limits of the "contMap" object. This is why some of the plotted CIs (most noticeably, the root) have black regions. This should be easy to fix as follows:

obj<-contMap(tree,y,lims=c(-6,2.8),plot=FALSE)
plot(obj,xlim=c(-0.5,10.5))
errorbar.contMap(obj,scale.by.ci=TRUE)

plot of chunk unnamed-chunk-6

Some of the kinks are still being worked out. Please pardon any bugs!

Error bars on divergence times on a phylogeny plotted in R

$
0
0

Today a colleague at Universidad de los Andes (where I'm currently on sabbatical, BTW) asked me how to add error bars for divergence times to the nodes of a plotted phylogeny in R.

Well, as is the case with many things, I bet that there is a ton of different ways to do this; however, the following is one very simple technique.

In the following, I have a "phylo" object tree; and I also have a matrix (CI) containing the times before the present for the lower & upper confidence limits on each estimated divergence time corresponding to the nodes of my tree. Note that these CIs have been simulated to be intentionally assymetric - as such confidence intervals often are:

library(phytools)
## here are the data:
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## Z, Y, X, W, V, U, ...
##
## Rooted; includes branch lengths.
CI
##    lower (MYBP) upper (MYBP)
## 27 111.80101 92.599420
## 28 90.13474 64.370663
## 29 48.45811 28.394226
## 30 42.77937 22.359999
## 31 83.01736 61.962733
## 32 41.48123 18.614973
## 33 26.10956 0.000000
## 34 95.60433 75.950823
## 35 88.41807 68.551308
## 36 93.32274 69.118385
## 37 84.96369 66.249093
## 38 29.45826 6.303606
## 39 72.61971 54.372320
## 40 73.15644 45.563312
## 41 65.70458 37.262456
## 42 35.86271 18.184414
## 43 34.17755 12.586219
## 44 47.11305 26.530968
## 45 50.74606 24.070106
## 46 41.64973 23.878326
## 47 58.47533 35.912961
## 48 13.19029 0.000000
## 49 35.42824 11.007959
## 50 27.73721 6.913533
## 51 15.31440 0.000000

Now let's plot it:

plotTree(tree,xlim=c(110,-5),direction="leftwards",
mar=c(4.1,1.1,1.1,1.1),ftype="i")
abline(v=seq(0,120,by=10),lty="dashed",
col=make.transparent("grey",0.5))
axis(1,at=seq(0,120,by=20))
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
for(i in 1:tree$Nnode+Ntip(tree))
lines(x=c(CI[i-Ntip(tree),1],CI[i-Ntip(tree),2]),
y=rep(obj$yy[i],2),lwd=11,lend=0,
col=make.transparent("blue",0.4))
points(obj$xx[1:tree$Nnode+Ntip(tree)],
obj$yy[1:tree$Nnode+Ntip(tree)],pch=19,col="blue",
cex=1.8)

plot of chunk unnamed-chunk-2

(I added a few embellishments, but it should be straightforward to pick these out.)

I like it.

As with most such plots, it will tend to look better if expored as a PDF.

Note that because my CIs are in time for the present, I decide to plot my tree "leftwards", but then flip my x axis to run from higher to lesser values using the argument value xlim=c(110,-5).

Note also that in the example above I have assumed that the order of the rows of CI matches the order of the node indices in the tree.

These were simulated data, obviously. The code used to simulate these values was as follows:

tree<-pbtree(n=26,tip.label=LETTERS[26:1],scale=100)
h<-sapply(1:tree$Nnode+Ntip(tree),nodeheight,tree=tree)
CI<-cbind(h-runif(n=length(h),min=10,max=20),h+runif(n=length(h),
min=5,max=10))
CI[CI>max(nodeHeights(tree))]<-max(nodeHeights(tree))
CI<-max(nodeHeights(tree))-CI
rownames(CI)<-1:tree$Nnode+Ntip(tree)
colnames(CI)<-paste(c("lower","upper"),"(MYBP)")

Function to plot a tree with error bars on divergence dates in R

$
0
0

YesterdayI posted about plotting error bars around divergence times onto internal nodes of the tree.

The following (plotTree.errorbars) is a function to automate this procedure, to be shortly added to phytools.

## plot tree with error bars around divergence times at nodes
## written by Liam J. Revell 2017
plotTree.errorbars<-function(tree,CI,...){
args<-list(...)
if(!is.null(args$gridlines)){
gridlines<-args$gridlines
args$gridlines<-NULL
} else gridlines<-TRUE
if(is.null(args$mar)) args$mar<-c(4.1,1.1,1.1,1.1)
if(is.null(args$ftype)) args$ftype<-"i"
fsize<-if(!is.null(args$fsize)) args$fsize else 1
if(is.null(args$direction)) args$direction<-"leftwards"
if(!is.null(args$bar.width)){
bar.width<-args$bar.width
args$bar.width<-NULL
} else bar.width<-11
if(!is.null(args$cex)){
cex<-args$cex
args$cex<-NULL
} else cex<-1.2
if(!is.null(args$bar.col)){
bar.col<-args$bar.col
args$bar.col<-NULL
} else bar.col<-"blue"
par(mar=args$mar)
plot.new()
th<-max(nodeHeights(tree))
h<-max(th,max(CI))
if(is.null(args$xlim)){
m<-min(min(nodeHeights(tree)),min(CI))
d<-diff(c(m,h))
pp<-par("pin")[1]
sw<-fsize*(max(strwidth(tree$tip.label,units="inches")))+
1.37*fsize*strwidth("W",units="inches")
alp<-optimize(function(a,d,sw,pp) (a*1.04*d+sw-pp)^2,
d=d,sw=sw,pp=pp,
interval=c(0,1e6))$minimum
args$xlim<-if(args$direction=="leftwards") c(h,m-sw/alp) else
c(m,h+sw/alp)
}
if(is.null(args$at)) at<-seq(0,h,by=h/5)
else {
at<-args$at
args$at<-NULL
}
args$tree<-tree
args$add<-TRUE
do.call(plotTree,args=args)
if(gridlines) abline(v=at,lty="dashed",
col=make.transparent("grey",0.5))
axis(1,at=at,labels=signif(at,3))
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
for(i in 1:tree$Nnode+Ntip(tree))
lines(x=c(CI[i-Ntip(tree),1],CI[i-Ntip(tree),2]),
y=rep(obj$yy[i],2),lwd=bar.width,lend=0,
col=make.transparent(bar.col,0.4))
points(obj$xx[1:tree$Nnode+Ntip(tree)],
obj$yy[1:tree$Nnode+Ntip(tree)],pch=19,col=bar.col,
cex=cex)
}

As I have mentioned in an earlier post - one of the most difficult components of plotting a tree in R is leaving enough space in the plotting window for the tip labels. This is because we normally don't know how much space a particular string will occupy (in user units) until we have already opened our plotting device & set our xlim and ylimvalues. Consequently, the example will use a tree with tip labels B through Z, plus one really long tip label, to make sure that enough space has been allocated. BTW, my solution to this problem (above) is adapted from plot.phylo in the ape package.

OK, now let's test it out:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## Z, Y, X, W, V, U, ...
##
## Rooted; includes branch lengths.
CI
##        lower     upper
## 27 2.8414221 2.2855101
## 28 2.4025799 1.7851208
## 29 1.1518398 0.6932472
## 30 1.1016629 0.5915718
## 31 1.5786906 0.9097206
## 32 1.2866289 0.6838407
## 33 0.4025250 0.0000000
## 34 0.7735031 0.2052436
## 35 0.6981268 0.2212514
## 36 2.7675974 2.1440829
## 37 2.3925982 1.8320681
## 38 1.5706797 0.9821414
## 39 1.2980457 0.9041184
## 40 1.1321683 0.5452341
## 41 0.4097613 0.0000000
## 42 0.4676039 0.0000000
## 43 0.7283202 0.2276139
## 44 0.4388847 0.0000000
## 45 1.2948954 0.7909557
## 46 0.6314838 0.1167126
## 47 1.2049871 0.6604358
## 48 1.0396180 0.5733247
## 49 0.5778952 0.1890986
## 50 0.3843571 0.0000000
## 51 1.0729443 0.5512091
plotTree.errorbars(tree,CI)

plot of chunk unnamed-chunk-2

Note that there is a lot we can do to modify this plot. Firstly, any argument of plotTree can be passed internally to that function. For instance:

plotTree.errorbars(tree,CI,lwd=5,color="navy",lend=0)

plot of chunk unnamed-chunk-3

However, it is also possible to modify the specify attributes of the plot. For instance:

plotTree.errorbars(tree,CI,lwd=1,bar.col="red",bar.width=7,cex=1,
at=seq(0,3,by=0.5))

plot of chunk unnamed-chunk-4

Pretty neat.

The tree & CIs for this example were simulated as follows:

tree<-pbtree(n=26,tip.label=LETTERS[26:1])
tree$tip.label[26]<-"Some really long tip label."
h<-sapply(1:tree$Nnode+Ntip(tree),nodeheight,tree=tree)
d<-max(nodeHeights(tree))
CI<-cbind(h-runif(n=length(h),min=.10*d,max=.20*d),h+runif(n=length(h),
min=.05*d,max=.10*d))
CI[CI>max(nodeHeights(tree))]<-max(nodeHeights(tree))
CI<-max(nodeHeights(tree))-CI
rownames(CI)<-1:tree$Nnode+Ntip(tree)
colnames(CI)<-c("lower","upper")

More on PGLS with error in y

$
0
0

Nearly two years ago I posted about fitting a PGLS model taking into account sampling variance (uncertainty) in y.

Inspired by a student dataset & problem here at los Andes, the following is a slightly differently implemented solution that should lead to the same result:

pgls.SEy<-function(model,data,corClass=corBrownian,tree=tree,
se=NULL,method=c("REML","ML"),interval=c(0,1000),...){
corfunc<-corClass
## preliminaries
data<-data[tree$tip.label,]
if(is.null(se)) se<-setNames(rep(0,Ntip(tree),
tree$tip.label))
## likelihood function
lk<-function(sig2e,data,tree,model,ve){
tree$edge.length<-tree$edge.length*sig2e
ii<-sapply(1:Ntip(tree),function(x,e) which(e==x),
e=tree$edge[,2])
tree$edge.length[ii]<-tree$edge.length[ii]+
ve[tree$tip.label]
v<-diag(vcv(tree))
vf<-varFixed(~v)
COR<-corfunc(1,tree,...)
fit<-gls(model,data=data,correlation=COR,weights=vf)
-logLik(fit)
}
## estimate sig2[e]
fit<-optimize(lk,interval=interval,
data=data,tree=tree,model=model,ve=se^2)
tree$edge.length<-tree$edge.length*fit$minimum
ii<-sapply(1:Ntip(tree),function(x,e) which(e==x),
e=tree$edge[,2])
tree$edge.length[ii]<-tree$edge.length[ii]+
se[tree$tip.label]^2
v<-diag(vcv(tree))
vf<-varFixed(~v)
## fit & return model
gls(model,data,correlation=corfunc(1,tree),weights=vf,
method=method)
}

Now let's try to apply it:

library(phytools)
library(nlme)
## here's our data
tree
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
##
## Tip labels:
## t65, t66, t29, t30, t13, t78, ...
##
## Rooted; includes branch lengths.
X
##               ye           x1          x2
## t65 -0.77900192 -0.933177258 -0.50289849
## t66 0.84646891 -0.360779073 -0.67343194
## t29 1.95846040 -0.004324650 -0.86464732
## t30 0.24373967 -0.283674892 -0.05950573
## t13 -5.54444003 -0.260222231 1.69823520
## t78 1.88453970 -0.030577008 -0.12238217
## t79 -0.59317590 -0.242023433 0.32537283
## t5 1.17182511 -0.351473922 -0.86081240
## t26 3.36099814 -0.417929491 -0.87860040
## t67 -0.21357307 -0.494481915 -1.04212209
## t68 0.73039688 -0.588127345 -0.71373910
## t86 2.81142943 -0.045597241 -1.28701218
## t87 4.89158144 -0.284624727 -1.19177925
## t25 2.69692032 -0.099326903 -1.26492190
## t16 1.27290355 -0.329391279 -0.69189543
## t47 -0.57529243 -0.998675854 -0.60213264
## t48 -4.91449264 -1.135659972 -0.90384083
## t10 2.88459685 -0.170786099 -0.84404497
## t20 1.11432889 -0.116176393 -1.43439903
## t63 0.67056966 -0.271788410 -0.48803237
## t64 2.25646332 -0.122474607 -0.87832126
## t53 2.52828710 -0.188851979 -0.48504631
## t21 1.78891253 -0.666761834 -1.27216994
## t23 1.45124819 -0.263860837 -1.16042313
## t49 1.03438725 -0.143231501 -1.19002853
## t50 0.61071781 0.258295208 -0.60080140
## t80 1.00845570 -0.326915114 0.10754248
## t81 -1.01437643 -0.507894799 0.26522058
## t19 -1.71820534 -0.811681993 -0.26318414
## t84 0.71358338 -0.737986864 -1.02763382
## t85 1.56004661 -0.712068874 -0.77728970
## t40 -0.38524799 -0.933452671 -0.83696879
## t41 0.49296392 -0.229120475 -1.49848945
## t18 2.76362994 -0.204012732 -1.64088635
## t76 -0.38900308 -0.117917709 -0.77909630
## t77 0.16193894 0.287088622 -0.72342580
## t27 2.64224382 -0.358620911 -1.71171019
## t92 -0.08353593 -0.337589308 -1.45006236
## t93 0.31220124 -0.312388755 -1.60509872
## t8 -1.04234066 -0.676186538 -0.41457179
## t2 -0.24958117 -1.382336906 -0.60476399
## t95 0.30844124 0.071824226 -0.04136211
## t96 0.17344643 -0.000369423 0.04677993
## t6 -4.30672086 -1.228130839 1.24440860
## t44 -0.39925053 -0.778840190 -0.10432506
## t51 -0.06845018 -1.529691824 -0.63523463
## t52 -0.33920397 -0.671563524 0.25110920
## t72 -0.22233595 -0.208175485 0.21888791
## t73 -0.74246626 -0.280504466 -0.05425123
## t35 0.58660363 -0.432133697 -0.60337432
## t38 -0.02995503 -1.599070451 -1.12646600
## t39 -0.99818963 -1.019397388 -0.14879313
## t99 2.02180211 0.819938089 0.10457809
## t100 3.03396613 0.797765277 0.09518138
## t24 -2.28364797 0.132350015 1.25951182
## t11 -1.03843807 -0.338010923 0.63895729
## t36 3.08690868 -0.109222940 -0.84517257
## t37 1.66001718 -0.783798640 0.25056370
## t28 -0.42294326 -0.624356825 -0.18617947
## t7 1.63649995 -0.995914763 -0.82088749
## t12 0.30240267 -0.430696448 -0.37673980
## t33 -0.90310597 -0.459191936 0.14792529
## t34 -0.46084029 -1.369205857 0.15734648
## t15 -0.95017011 -0.668597155 -0.19397849
## t42 -0.40629518 -1.006046065 -0.59102696
## t43 -1.45997398 -0.657881494 -0.76035426
## t22 -0.06754584 -0.549687736 -0.49300322
## t1 -0.52189221 -0.735847227 -0.10188643
## t3 1.42350778 -1.255254006 -1.36243767
## t4 -1.66965072 0.036571658 0.12260330
## t74 3.03252085 1.178218512 -0.76133067
## t75 2.48876980 1.064090251 -0.95706527
## t31 -0.97972153 -0.435236571 0.54331428
## t59 -3.89650669 -0.567740108 0.91285849
## t60 -1.92222453 -0.516435916 0.64042685
## t9 3.53788280 1.375046920 -1.24425619
## t90 6.09565095 0.878034624 -2.01772447
## t91 6.88955565 0.935005458 -2.09172115
## t56 -0.35774725 -0.517446554 0.30596671
## t57 1.23478325 -0.116159889 0.05991413
## t14 -1.27430128 -0.906819465 0.46607917
## t61 -4.01746299 -1.312009985 1.32878564
## t62 -2.40971394 -1.612880369 1.08221968
## t69 -2.04466742 -0.242584724 0.76102366
## t70 -4.84522907 -0.396631201 1.07417172
## t88 -0.17336430 0.237257899 0.41246936
## t89 -2.05031514 0.434875538 0.30737619
## t54 -0.80195450 -0.569749931 1.07419352
## t55 -1.48926859 -0.152194258 1.08008287
## t32 -0.87454629 0.602680886 0.75156851
## t45 -3.25099532 -1.438964529 2.01395563
## t46 -2.70322247 -1.404376125 1.86329858
## t17 0.38924791 -0.217652443 0.32900077
## t58 -3.40005156 -1.498640558 1.17233451
## t82 -1.89445105 -1.348265012 1.08151593
## t83 -2.05781804 -1.381394718 1.22167672
## t71 -1.44719125 -1.170708866 1.02083131
## t94 -2.16041746 -1.200085947 1.03456095
## t97 -1.44175507 -1.213089703 1.18276260
## t98 -3.88229471 -1.402582088 1.24727944
SE
##       t65       t66       t29       t30       t13       t78       t79 
## 1.5378419 0.6315825 0.8879493 0.5928441 1.0271921 1.0799337 0.2015944
## t5 t26 t67 t68 t86 t87 t25
## 1.4215427 1.6140998 1.1763704 0.3116548 0.3712812 2.1127712 1.7325671
## t16 t47 t48 t10 t20 t63 t64
## 0.6036392 0.2305401 2.1632606 0.5891664 1.6319491 0.1739720 0.2086460
## t53 t21 t23 t49 t50 t80 t81
## 1.4821123 0.4555341 0.3351531 1.2322628 0.3969949 0.8365659 0.2839710
## t19 t84 t85 t40 t41 t18 t76
## 0.3487848 1.1665896 1.5049262 0.5291481 0.5503178 0.7437542 0.9244183
## t77 t27 t92 t93 t8 t2 t95
## 0.4384969 0.6420107 0.6988273 0.7246152 0.8660270 1.5637886 1.8151181
## t96 t6 t44 t51 t52 t72 t73
## 1.3655987 0.9416576 0.7521436 0.4052548 0.7403313 0.3570909 1.2056779
## t35 t38 t39 t99 t100 t24 t11
## 0.8792995 0.8277867 1.3258279 0.1751977 0.8815140 0.4549706 0.1941065
## t36 t37 t28 t7 t12 t33 t34
## 1.4607861 1.1021203 0.6038450 0.9676804 0.2473128 0.1840373 1.6307541
## t15 t42 t43 t22 t1 t3 t4
## 0.7664964 1.1059112 1.1145843 0.9188596 1.5650796 0.4641816 0.6615129
## t74 t75 t31 t59 t60 t9 t90
## 0.9735615 1.0394457 1.5125566 1.8766065 0.5621988 0.9335404 0.2087170
## t91 t56 t57 t14 t61 t62 t69
## 0.7721703 1.8066518 0.4828995 0.8829816 0.7459172 2.1854232 0.7337223
## t70 t88 t89 t54 t55 t32 t45
## 1.4301866 0.2787410 1.0083834 0.5330130 1.1466571 1.7584170 0.3040079
## t46 t17 t58 t82 t83 t71 t94
## 0.8741375 0.8283463 0.5284757 1.1277393 0.3540923 0.3703903 1.0910862
## t97 t98
## 0.9403271 1.3459290
fit<-pgls.SEy(ye~x1+x2,data=X,se=SE,tree=tree,method="ML")
fit
## Generalized least squares fit by maximum likelihood
## Model: model
## Data: data
## Log-likelihood: -166.1253
##
## Coefficients:
## (Intercept) x1 x2
## 0.6837662 1.5101552 -1.6872891
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~v
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 1.774718
summary(fit)
## Generalized least squares fit by maximum likelihood
## Model: model
## Data: data
## AIC BIC logLik
## 340.2505 350.6712 -166.1253
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~v
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 0.6837662 0.1650265 4.143373 1e-04
## x1 1.5101552 0.1365145 11.062231 0e+00
## x2 -1.6872891 0.1153723 -14.624738 0e+00
##
## Correlation:
## (Intr) x1
## x1 0.122
## x2 0.078 0.376
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.7961712 -0.8297315 -0.2312799 0.4027741 3.0330635
##
## Residual standard error: 1.774718
## Degrees of freedom: 100 total; 97 residual

In a separate post, I'm going to look at power & type I error when SEs are ignored vs. taken into account.

Note that also taking uncertainty in the xs is account is considerably more complicated. For that we would need to employ an approach similar to that of Ives et al. (2007).

The data & tree for this exercise were simulated as follows:

## simulate tree & data
set.seed(999)
tree<-pbtree(n=100,scale=1)
X<-fastBM(tree,nsim=2)
colnames(X)<-c("x1","x2")
beta<-c(1,1,-2)
y<-cbind(rep(1,Ntip(tree)),X)%*%beta+fastBM(tree)
v<-setNames(rexp(n=Ntip(tree)),tree$tip.label)
ye<-setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),
rownames(y))
X<-as.data.frame(cbind(ye,X))
SE<-sqrt(v)

Statistical behavior of PGLS taking account sampling error of y

$
0
0

Earlier todayI posted on taking sampling error in y into account during phylogenetic regression or ANOVA.

In the following, I'm going to do a small investigation into type I error & power of this method. I will compare type I error & power to phylogenetic regression ignoring error in y.

First, type I error using a simple bivariate regression.

library(phytools)
## Loading required package: ape
## Loading required package: maps
library(nlme)
packageVersion("phytools")
## [1] '0.5.80'
set.seed(1)
## simulate trees
trees<-pbtree(n=100,nsim=200)
## simulate true values of x & y
x<-lapply(trees,fastBM)
y<-lapply(trees,fastBM)
## simulate sampling variances for y
v<-lapply(trees, function(t) setNames(rexp(n=Ntip(t)),t$tip.label))
ye<-mapply(function(y,v)
setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),
names(y)),y=y,v=v,SIMPLIFY=FALSE)
## fit PGLS ignoring sampling error in y
fitGLS<-function(t,x,y) gls(y~x,data.frame(x,y),
correlation=corBrownian(1,t),method="ML")
fits.gls<-mapply(fitGLS,trees,x,ye,SIMPLIFY=FALSE)
## now taking it into account
fitSEy<-function(t,x,y,v) pgls.SEy(y~x,data.frame(x,y),
tree=t,se=sqrt(v),method="ML")
fits.SEy<-mapply(fitSEy,trees,x,ye,v=v,SIMPLIFY=FALSE)
P.gls<-sapply(fits.gls,function(x) anova(x)$"p-value"[2])
mean(P.gls<=0.05) ## type I error method 1
## [1] 0.04
P.sey<-sapply(fits.SEy,function(x) anova(x)$"p-value"[2])
mean(P.sey<=0.05) ## type I error method 2
## [1] 0.05

So we should see straight away that there is no effect on type I error of ignoring sampling error in the estimation of species values for y. How about an effect on our estimated parameter, β1?

beta.gls<-sapply(fits.gls,function(x) coefficients(x)[2])
beta.sey<-sapply(fits.SEy,function(x) coefficients(x)[2])
d.gls<-density(beta.gls,bw=0.1)
d.sey<-density(beta.sey,bw=0.1)
plot(d.gls$x,d.gls$y,type="l",xlim=range(c(d.gls$x,d.sey$x)),
ylim=range(c(d.gls$y,d.sey$y)),ylab="density",
xlab=expression(beta[1]),lwd=2)
lines(d.sey$x,d.sey$y,lwd=2,lty="dotted")
text(d.gls$x[which(d.gls$y==max(d.gls$y))],
max(d.gls$y),label="GLS (no error)",pos=4)
text(d.sey$x[which(d.sey$y==max(d.sey$y))],
max(d.sey$y),label="GLS (with error)",pos=4)
abline(v=0,lty="dashed",col="grey")

plot of chunk unnamed-chunk-2

What's interesting here is that the type I error of both methods is basically identical; however the variance of the estimator that takes into count sampling error in yis much lower than the other.

Now, let's go a bit a farther & investigate the performance of each method for varying values of β1, our regression coefficient. We can do this as follows:

b1<-c(0,0.2,0.5,1,1.5,2)
P.gls<-P.sey<-beta.gls<-beta.sey<-matrix(NA,length(trees),length(b1),
dimnames=list(NULL,b1))
for(i in 1:length(b1)){
x<-lapply(trees,fastBM)
y<-mapply(function(t,x,b) b*x+fastBM(t),t=trees,x=x,
MoreArgs=list(b=b1[i]),SIMPLIFY=FALSE)
v<-lapply(trees, function(t) setNames(rexp(n=Ntip(t)),t$tip.label))
ye<-mapply(function(y,v)
setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),
names(y)),y=y,v=v,SIMPLIFY=FALSE)
fitGLS<-function(t,x,y) gls(y~x,data.frame(x,y),
correlation=corBrownian(1,t),method="ML")
fits.gls<-mapply(fitGLS,trees,x,ye,SIMPLIFY=FALSE)
fitSEy<-function(t,x,y,v) pgls.SEy(y~x,data.frame(x,y),
tree=t,se=sqrt(v),method="ML")
fits.SEy<-mapply(fitSEy,trees,x,ye,v=v,SIMPLIFY=FALSE)
P.gls[,i]<-sapply(fits.gls,function(x) anova(x)$"p-value"[2])
P.sey[,i]<-sapply(fits.SEy,function(x) anova(x)$"p-value"[2])
beta.gls[,i]<-sapply(fits.gls,function(x) coefficients(x)[2])
beta.sey[,i]<-sapply(fits.SEy,function(x) coefficients(x)[2])
}

First, we can look the mean parameter estimates of each method:

par(mfrow=c(1,2))
boxplot(beta.gls,ylim=c(-0.5,3),main="GLS (no error)")
abline(h=b1,col="grey",lty="dashed")
boxplot(beta.sey,ylim=c(-0.5,3),main="GLS (with error)")
abline(h=b1,col="grey",lty="dashed")

plot of chunk unnamed-chunk-4

We can easily see that (although both methods are unbiased), variability among simulations is substantially higher when error in y is not taken into consideration.

We would expect this to affect our power to reject the null hypothesis of β1 = 1, and it does:

plot(b1,colMeans(P.gls<=0.05),type="b",lwd=2,xlim=c(0,2),
ylim=c(0,1),xlab=expression(beta[1]),ylab="type I error / power")
lines(b1,colMeans(P.sey<=0.05),type="b",lwd=2,lty="dotted")
text(b1[3],colMeans(P.gls<=0.05)[3],label="GLS (no error)",
pos=4)
text(b1[3],colMeans(P.sey<=0.05)[3],label="GLS (with error)",
pos=2)
abline(h=0.05,lty="dashed",col="grey")
text(x=2,y=0.05,"0.05",pos=3)

plot of chunk unnamed-chunk-5

Neat.

Bug fixes for non-flat prior on the root node in make.simmap (and for MCMC method)

$
0
0

A couple of different phytools users have reported a bug (e.g., here) in make.simmap for values of the prior distribution on the root node other than the default.

This bug was one that I evidently accidentally introduced into make.simmapwith the best intention of allowing users to send optional arguments to the internally used function fitMk for optimizing the transition model or computing the likelihood.

I believe that I have now fixedthis bug; however I await user reports to the contrary (should they be necessary).

I also detected and (hopefully) fixed (1, 2) a second bug in make.simmap(...,Q="mcmc") that I believe I must have introduced at the same time.

Just to mention a little bit about the difference between the default method (Q="empirical") and Q="mcmc". The default method first maximizes the likelihood of the transition matrix, Q and then uses this “empirical” value of Q to sample character histories for the discrete trait from their posterior distribution. By contrast, Q="mcmc" is essentially a fully heirarchical Bayesian method in which (first) nsim values of Q are sampled from their posterior probability distribution using Bayesian MCMC. and then (next) nsim character histories are obtained via stochastic mapping. In the event that multiple trees are input, an MCMC will be run on each tree.

Here's a quick demo on simulated data. (Simulation conditions are given at the end.

First, let's try the Q="empirical" (default) method; however I will also impose a prior on the root node to have state "a" with probability 0.90:

library(phytools)
packageVersion("phytools")
## [1] '0.5.81'
tree
## 
## Phylogenetic tree with 64 tips and 63 internal nodes.
##
## Tip labels:
## t63, t64, t50, t51, t13, t45, ...
##
## Rooted; includes branch lengths.
x
## t63 t64 t50 t51 t13 t45 t46 t20 t48 t49 t47 t30 t56 t57 t15 t12 t21 t22 
## "a" "a" "a" "a" "b" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
## t8 t1 t26 t27 t52 t53 t23 t28 t29 t10 t11 t34 t35 t33 t54 t55 t14 t37
## "a" "b" "b" "b" "a" "a" "a" "a" "a" "a" "a" "b" "b" "a" "b" "b" "b" "b"
## t43 t44 t2 t24 t25 t18 t19 t58 t59 t7 t9 t36 t41 t42 t5 t16 t17 t3
## "b" "b" "b" "a" "a" "b" "b" "b" "b" "b" "a" "b" "b" "b" "a" "a" "a" "a"
## t4 t6 t31 t32 t61 t62 t60 t38 t39 t40
## "a" "a" "b" "a" "a" "a" "a" "b" "a" "a"
## prior on root node
pi<-setNames(c(0.9,0.1),c("a","b"))
pi
##   a   b 
## 0.9 0.1
empirical<-make.simmap(tree,x,nsim=100,pi=pi,Q="empirical")
## make.simmap is sampling character histories conditioned on the transition matrix
##
## Q =
## a b
## a -1.136101 1.136101
## b 1.136101 -1.136101
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## a b
## 0.9 0.1
## Done.
sum.emp<-summary(empirical)
sum.emp
## 100 trees with a mapped discrete character with states:
## a, b
##
## trees have 18.31 changes between states on average
##
## changes are of the following types:
## a,b b,a
## x->y 11.54 6.77
##
## mean total time spent in each state is:
## a b total
## raw 11.0705364 4.9707401 16.04128
## prop 0.6901281 0.3098719 1.00000
cols<-setNames(c("red","blue"),c("a","b"))
plot(sum.emp,colors=cols,fsize=0.8,ftype="i")
add.simmap.legend(colors=cols,prompt=FALSE,x=0,y=3)

plot of chunk unnamed-chunk-1

Now we can compare the Q="mcmc" method as follows:

mcmc<-make.simmap(tree,x,nsim=100,pi=pi,Q="mcmc")
## Running MCMC burn-in. Please wait....
## Running 10000 generations of MCMC, sampling every 100 generations.
## Please wait....
##
## make.simmap is simulating with a sample of Q from
## the posterior distribution
##
## Mean Q from the posterior is
## Q =
## a b
## a -1.135897 1.135897
## b 1.135897 -1.135897
## and (mean) root node prior probabilities
## pi =
## a b
## 0.9 0.1
## Done.
sum.mcmc<-summary(mcmc)
sum.mcmc
## 100 trees with a mapped discrete character with states:
## a, b
##
## trees have 18.49 changes between states on average
##
## changes are of the following types:
## a,b b,a
## x->y 11.73 6.76
##
## mean total time spent in each state is:
## a b total
## raw 11.0470643 4.9942122 16.04128
## prop 0.6886649 0.3113351 1.00000
cols<-setNames(c("red","blue"),c("a","b"))
plot(sum.mcmc,colors=cols,fsize=0.8,ftype="i")
add.simmap.legend(colors=cols,prompt=FALSE,x=0,y=3)

plot of chunk unnamed-chunk-2

Most likely, at least under the defaults for both methods, our posterior probabilities at all nodes should be similar:

plot(sum.emp$ace,sum.mcmc$ace,xlab="empirical Q",ylab="Q sampled using MCMC")
lines(c(0,1),c(0,1),lty="dashed",col="red")

plot of chunk unnamed-chunk-3

The tree & data for this demo were simulated as follows:

Q<-matrix(c(-1,1,1,-1),2,2,dimnames=list(letters[1:2],letters[1:2]))
x<-getStates(sim.history(tree<-pbtree(n=64,scale=1),Q,
anc="a"),"tips")

Difference equation approximation OU simulation function

$
0
0

For reasons I won't get into in detail, I just wrote a simple function to conduct Ornstein-Uhlenbeck simulations by finely discretizing time and then using a simple difference equation version of the OU model to simulate as follows:

multiOU<-function(tree,alpha,sig2,theta=NULL,a0=NULL,internal=TRUE,...){
if(hasArg(dt)) dt<-list(...)$dt
else dt<-1/1000*max(nodeHeights(tree))
ss<-sort(unique(c(getStates(tree,"tips"),getStates(tree,"nodes"))))
if(is.null(theta)) theta<-setNames(rep(0,length(ss)),ss)
if(is.null(a0)) a0<-0
tree<-reorder(tree,"cladewise")
root<-Ntip(tree)+1
S<-matrix(NA,nrow(tree$edge),2)
S[which(tree$edge[,1]==root),1]<-a0
for(i in 1:nrow(tree$edge)){
x1<-S[i,1]
for(j in 1:length(tree$maps[[i]])){
t<-tree$maps[[i]][j]
ALPHA<-alpha[names(t)]
SIG2<-sig2[names(t)]
THETA<-theta[names(t)]
t<-c(rep(dt,floor(t/dt)),t%%dt)
for(k in 1:length(t))
x1<-x1+ALPHA*(THETA-x1)*t[k]+
rnorm(n=1,sd=sqrt(SIG2*t[k]))
}
S[i,2]<-x1
if(any(tree$edge[,1]==tree$edge[i,2]))
S[which(tree$edge[,1]==tree$edge[i,2]),1]<-S[i,2]
}
x<-setNames(c(S[1,1],S[,2]),c(tree$edge[1,1],
tree$edge[,2]))[as.character(1:(Ntip(tree)+tree$Nnode))]
names(x)[1:Ntip(tree)]<-tree$tip.label
if(!internal) x<-x[tree$tip.label]
x
}

In the following, I'll use it to simulate:

library(phytools)
set.seed(116)
cols<-setNames(c("black","blue","red"),letters[1:3])
tree<-pbtree(n=100,scale=2)
Q<-matrix(c(-1,1,0,1,-2,1,0,1,-1),3,3,
dimnames=list(letters[1:3],letters[1:3]))
tree<-sim.history(tree,Q,anc="a")
## Done simulation(s).
sig2<-setNames(c(1,2,1),letters[1:3])
alpha<-setNames(c(2,2,2),letters[1:3])
theta<-setNames(c(10,0,-10),letters[1:3])
x<-multiOU(tree,alpha,sig2,theta)
phenogram(tree,x,ftype="off",colors=cols,ylim=c(-11,11),
xlim=c(0,2))
abline(h=c(-10,0,10),lty="dashed",col=sapply(c("red","blue",
"black"),make.transparent,alpha=0.5),lwd=3,lend=2)
add.simmap.legend(colors=cols,prompt=FALSE,x=0,y=-5)

plot of chunk unnamed-chunk-2

I'm going to add (importantly) that the same simulation is done without the discrete difference equation approximation by OUwie.simin the powerful OUwiepackage by Jeremy Beaulieu & Brian O'Meara.

Finally, let's fit the appropriate 'OUwie' (that is, flexible OU) model to these data:

library(OUwie)
data<-data.frame(Genus_species=tree$tip.label,
Reg=getStates(tree,"tips"),X=as.numeric(x[tree$tip.label]))
OUwie(tree,data,model="OUMV",simmap.tree=TRUE)
## Initializing... 
## Finished. Begin thorough search...
## Finished. Summarizing results.
## 
## Fit
## lnL AIC AICc model ntax
## -153.169 320.3379 321.5553 OUMV 100
##
##
## Rates
## a b c
## alpha 1.6410055 1.641006 1.641006
## sigma.sq 0.8331012 12.695801 73.799702
##
## Optima
## a b c
## estimate 9.5391941 -2.5029839 -14.22271
## se 0.1269122 0.8906037 2.51031
##
## Arrived at a reliable solution

One of our σ2 estimates is way off - but aside from that, we seem to have more or less captured the generating process. Cool.

Viewing all 801 articles
Browse latest View live