Quantcast
Channel: Phylogenetic Tools for Comparative Biology
Viewing all 802 articles
Browse latest View live

Update to plotSimmap, fix to cladelabels, & plotting a tree across multiple pages

$
0
0

I have been working on producing a tree for the appendix of a manuscript in review at Evolution by my Ph.D. student Kristin Winchell. The tree is fairly big (433 tips), & yet it is necessary for the tip labels & symbols to be fairly legible in print so as not to obscure interpretation.

The demo below requires new versions of plotSimmap, as well as the function split.plotTree (not to be confused with splitplotTree), not presently in phytools, but described beforeon this blog. (Code here.)

foo<-function(){ 
par(fg="gray48")
nodelabels(node=uu,pie=matrix(rep(1,length(uu)),length(uu),1),
piecol="gray48",cex=0.2)
par(fg="green")
nodelabels(node=nn,pie=matrix(rep(1,length(uu)),length(uu),1),
piecol="green",cex=0.2)
par(fg="black")
nodelabels(node=mm,pie=matrix(rep(1,length(mm)),length(mm),1),
piecol="white",cex=0.2)
cladelabels(text="Ponce",node=mrca.ponce,offset=2.3)
cladelabels(text="San Juan",node=mrca.sj1,offset=4,
orientation="horizontal")
cladelabels(text="Mayagüez",node=mrca.mayaguez,offset=1.8)
cladelabels(text="San Juan",node=mrca.sj2,offset=2.8)
add.simmap.legend(colors=setNames("gray48","urban"),
prompt=FALSE,x=0,y=438,shape="circle")
add.simmap.legend(colors=setNames("green","natural"),
prompt=FALSE,x=0.15*max(nodeHeights(tree)),y=438,
shape="circle")
add.simmap.legend(colors=setNames("white",
"GenBank sequence from Mayagüez, Ponce, or San Juan"),
prompt=FALSE,x=0,y=442,shape="circle")
}

tip.spacing<-c(1:146,149:295,298:437)
split.plotTree(tree,fsize=0.6,lwd=1,xlim=c(0,0.25),
split=c(0.33,0.668),fn=foo,tips=tip.spacing,y.lim=c(0,445))

plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1

The key updates to cladelabels and plotSimmapwere the following:

1) In plotSimmap I made the small change to permit the vector giving the desired terminal edge vertical position (tips) to not have names, in which case it would be assumed to have the order given by the 1:Ntip(tree) node indices. This is because in its current iteration it fails if some tip labels are repeating (as they are in this tree).

2) In cladelabels the function (erroneously) assumed by default that the tips have y positions 1:Ntip(tree).

The purpose of the two updates was to allow the multiple for greater spacing of tip labels across the splits between pages to ensure that the edges, tip labels, or tip symbols plotted cleanly.

A higher quality (PDF) version can be seen:

pdf(file="cristatellus-split.plotTree.pdf",height=11,width=7)
split.plotTree(tree,fsize=0.6,lwd=1,xlim=c(0,0.25),
split=c(0.33,0.668),fn=foo,tips=tip.spacing,y.lim=c(0,445))
dev.off()
## png 
## 2

here.


Update to demarcating changes on a plotted "simmap" style tree using markChanges

$
0
0

I posted recentlyabout a new phytools function to mark reconstructed changes on the tree from stochastic mapping. I will be adding this function to phytools; however I have made a couple of changes. Foremost among these is that the function now returns (invisibly) a matrix containing the x& ycoordinates of the plotted changes on the tree, as well as the type of each change (e.g., a->b, b->c, etc. We can also turn off plotting so that only this matrix is computed & returned. This theoretically enables us to overlay any symbol that we want on the tree to mark changes.

Here's a quick demo of the function as presently written:

library(phytools)
data(anoletree)
## function to mark changes
markChanges<-function(tree,colors=NULL,cex=1,lwd=2,plot=TRUE){
states<-sort(unique(getStates(tree)))
if(is.null(colors)) colors<-setNames(palette()[1:length(states)],
states)
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
nc<-sapply(tree$maps,length)-1
ii<-which(nc>0)
nc<-nc[ii]
xx<-yy<-vector()
for(i in 1:length(ii)){
for(j in 1:nc[i]){
ss<-names(tree$maps[[ii[i]]])[j+1]
mm<-tree$edge[ii[i],1]
dd<-tree$edge[ii[i],2]
x<-rep(obj$xx[mm]+cumsum(tree$maps[[ii[i]]])[j],2)
y<-c(obj$yy[dd]-0.5*mean(strheight(LETTERS)*cex),
obj$yy[dd]+0.5*mean(strheight(LETTERS)*cex))
if(plot) lines(x,y,lwd=lwd,col=colors[ss],lend=2)
xx<-c(xx,setNames(x[1],
paste(names(tree$maps[[ii[i]]])[j:(j+1)],
collapse="->")))
yy<-c(yy,mean(y))
}
}
XY<-cbind(xx,yy)
colnames(XY)<-c("x","y")
invisible(XY)
}
plotSimmap(anoletree,ftype="i",fsize=0.6,ylim=c(-1,Ntip(anoletree)))
## no colors provided. using the following legend:
## CG GB TC TG Tr Tw
## "black" "red" "green3" "blue" "cyan" "magenta"
add.simmap.legend(x=0,y=-1,colors=
setNames(palette()[1:length(unique(getStates(anoletree,"tips")))],
sort(unique(getStates(anoletree,"tips")))),prompt=FALSE,vertical=FALSE)
obj<-markChanges(anoletree,lwd=2,cex=0.6)

plot of chunk unnamed-chunk-1

Now obj is a matrix containing the x& y coordinates of each change as recorded in this particular stochastic map of (here) ecomorph state on the tree. So, in this case:

obj
##                x        y
## TG->GB 4.8320281 8.00000
## TG->TC 3.9596523 15.25000
## TC->CG 4.9301502 14.00000
## TG->Tw 3.5809367 17.00000
## TG->TC 3.8002031 19.50000
## TG->GB 3.8962124 21.50000
## TG->GB 4.2372427 24.00000
## TG->Tr 2.0060270 29.93750
## TG->GB 2.9155935 33.25000
## TG->Tw 3.9550415 35.00000
## TG->CG 1.6044174 47.12500
## TG->Tw 0.4992279 59.88281
## Tw->TC 2.3342945 52.68750
## TC->Tw 3.1128996 54.00000
## Tw->Tr 5.0515733 54.00000
## Tw->TG 2.2167389 64.03125
## TG->GB 2.8428529 64.03125
## TG->CG 4.2711895 71.93750
## TG->GB 1.9843757 75.87500
## GB->CG 3.6874052 77.00000
## CG->Tw 5.6935007 77.00000
## TG->TC 2.3486612 80.12500
## TG->TC 1.5791488 82.00000
## TC->Tw 4.2869186 82.00000

We can use different symbols to plot these changes if we want to, for instance, simply:

plotSimmap(anoletree,ftype="i",fsize=0.6,ylim=c(-1,Ntip(anoletree)))
## no colors provided. using the following legend:
## CG GB TC TG Tr Tw
## "black" "red" "green3" "blue" "cyan" "magenta"
obj<-markChanges(anoletree,plot=FALSE)
add.simmap.legend(x=0,y=-1,colors=
setNames(palette()[1:length(unique(getStates(anoletree,"tips")))],
sort(unique(getStates(anoletree,"tips")))),prompt=FALSE,vertical=FALSE)
points(obj,pch=19)

plot of chunk unnamed-chunk-3

## or
plotSimmap(anoletree,ftype="i",fsize=0.6,ylim=c(-1,Ntip(anoletree)))
## no colors provided. using the following legend:
## CG GB TC TG Tr Tw
## "black" "red" "green3" "blue" "cyan" "magenta"
obj<-markChanges(anoletree,plot=FALSE)
add.simmap.legend(x=0,y=-1,colors=
setNames(palette()[1:length(unique(getStates(anoletree,"tips")))],
sort(unique(getStates(anoletree,"tips")))),prompt=FALSE,vertical=FALSE)
points(obj,pch=8)

plot of chunk unnamed-chunk-3

etc.

That's it for now.

Marking changes on a stochastic map tree, and some other updates to object classes in phytools

$
0
0

I have posted recently (e.g., 1, 2, 3) about a phytools function to mark changes on a plotted stochastic character map tree.

Well, one creative use of the latest update to this function might be, it occured to me, plotting arrows for changes of a particular type (say, decrease or increase in a discrete character trait on the tree, corresponding to 'up' or 'down' arrows, respectively). If a character tended to be lost but not regained, then, (for example, the number of digits) perhaps we would see losses but relatively few reaquisitions.

In the code below, I use a version of phytools that is not yet posted, but code for all the functions is already available on phytools.org.

Let's simulate data with a high rate of loss, & a low rate of reaquisition:

set.seed(872)
library(phytools)
Q<-matrix(c(-0.1,1,0,0,0,0,
0.1,-1.1,1,0,0,0,
0,0.1,-1.1,1,0,0,
0,0,0.1,-1.1,1,0,
0,0,0,0.1,-1.1,1,
0,0,0,0,0.1,-1),6,6,byrow=TRUE)
colnames(Q)<-rownames(Q)<-0:5
Q
##      0    1    2    3    4  5
## 0 -0.1 1.0 0.0 0.0 0.0 0
## 1 0.1 -1.1 1.0 0.0 0.0 0
## 2 0.0 0.1 -1.1 1.0 0.0 0
## 3 0.0 0.0 0.1 -1.1 1.0 0
## 4 0.0 0.0 0.0 0.1 -1.1 1
## 5 0.0 0.0 0.0 0.0 0.1 -1
tree<-sim.history(pbtree(n=50,scale=2),Q=Q,anc="5")
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Done simulation(s).
colors<-setNames(grey(seq(1,0,-0.2)),0:5)
plotTree(tree,lwd=6,ftype="off",ylim=c(-1,Ntip(tree)))
plotSimmap(tree,colors=colors,ftype="off",add=TRUE,lwd=4,
ylim=c(-1,Ntip(tree)))
add.simmap.legend(x=0,y=0,prompt=FALSE,colors=colors,
vertical=FALSE)
obj<-markChanges(tree,plot=FALSE)
for(i in 1:nrow(obj)){
states<-as.numeric(strsplit(rownames(obj)[i],"->")[[1]])
up<-if(states[1]>states[2]) TRUE else FALSE
cex<-0.8
arrows(obj[i,1],obj[i,2]-0.8*mean(strheight(LETTERS)*cex),
obj[i,1],obj[i,2]+0.8*mean(strheight(LETTERS)*cex),
length=0.5*mean(strheight(LETTERS,units="inches")*cex),
lwd=3,col="red",ljoin=1,code=if(up) 1 else 2)
}

plot of chunk unnamed-chunk-1

x<-getStates(tree,"tips") ## our data from simulation
sort(unique(x))
## [1] "0" "1" "2" "3" "4" "5"

OK, this is the real history. Now, let's imagine we don't know the history, but we suppose that loss & gain occur at different rates:

model<-matrix(c(0,1,0,0,0,0,
2,0,1,0,0,0,
0,2,0,1,0,0,
0,0,2,0,1,0,
0,0,0,2,0,1,
0,0,0,0,2,0),6,6)
rownames(model)<-colnames(model)<-0:5
model
##   0 1 2 3 4 5
## 0 0 2 0 0 0 0
## 1 1 0 2 0 0 0
## 2 0 1 0 2 0 0
## 3 0 0 1 0 2 0
## 4 0 0 0 1 0 2
## 5 0 0 0 0 1 0
x<-to.matrix(x,as.character(0:5))
smapsI<-make.simmap(tree,x,nsim=100,model=model)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##           0         1         2         3         4         5
## 0 -0.540027 0.540027 0.000000 0.000000 0.000000 0.000000
## 1 1.274384 -1.814411 0.540027 0.000000 0.000000 0.000000
## 2 0.000000 1.274384 -1.814411 0.540027 0.000000 0.000000
## 3 0.000000 0.000000 1.274384 -1.814411 0.540027 0.000000
## 4 0.000000 0.000000 0.000000 1.274384 -1.814411 0.540027
## 5 0.000000 0.000000 0.000000 0.000000 1.274384 -1.274384
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##         0         1         2         3         4         5 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
smapsI
## 100 phylogenetic trees with mapped discrete characters
obj<-summary(smapsI)
obj
## 100 trees with a mapped discrete character with states:
## 0, 1, 2, 3, 4, 5
##
## trees have 60.19 changes between states on average
##
## changes are of the following types:
## 0,1 0,2 0,3 0,4 0,5 1,0 1,2 1,3 1,4 1,5 2,0 2,1 2,3 2,4 2,5 3,0
## x->y 2.66 0 0 0 0 5.94 1.33 0 0 0 0 6.11 1.74 0 0 0
## 3,1 3,2 3,4 3,5 4,0 4,1 4,2 4,3 4,5 5,0 5,1 5,2 5,3 5,4
## x->y 0 9.54 4.25 0 0 0 0 10.41 6.56 0 0 0 0 11.65
##
## mean total time spent in each state is:
## 0 1 2 3 4 5
## raw 5.1938181 4.5646540 3.83968729 5.9647057 10.4758124 9.2228060
## prop 0.1322879 0.1162629 0.09779781 0.1519226 0.2668216 0.2349072
## total
## raw 39.26148
## prop 1.00000
plot(obj,colors=colors,ylim=c(-1,Ntip(smapsI[[1]])),ftype="off")
add.simmap.legend(x=0,y=0,prompt=FALSE,colors=colors,
vertical=FALSE)

plot of chunk unnamed-chunk-2

We can similarly plot the location of reconstructed changes on any of stochastic mapped trees, for instance:

plotTree(smapsI[[1]],lwd=6,ftype="off",ylim=c(-1,Ntip(smapsI[[1]])))
plotSimmap(smapsI[[1]],colors=colors,ftype="off",add=TRUE,lwd=4,
ylim=c(-1,Ntip(smapsI[[1]])))
add.simmap.legend(x=0,y=0,prompt=FALSE,colors=colors,
vertical=FALSE)
obj<-markChanges(smapsI[[1]],plot=FALSE)
for(i in 1:nrow(obj)){
states<-as.numeric(strsplit(rownames(obj)[i],"->")[[1]])
up<-if(states[1]>states[2]) TRUE else FALSE
cex<-0.8
arrows(obj[i,1],obj[i,2]-0.8*mean(strheight(LETTERS)*cex),
obj[i,1],obj[i,2]+0.8*mean(strheight(LETTERS)*cex),
length=0.5*mean(strheight(LETTERS,units="inches")*cex),
lwd=3,col="red",ljoin=1,code=if(up) 1 else 2)
}

plot of chunk unnamed-chunk-3

In this case, showing a pattern that is broadly similar to the generating pattern.

Finally, if we (for instance), knew that the ancestral state should have a particular value a priori - for instance, in this case, "5", we code set a strong prior on the state at the root to be "5". E.g.:

prior<-setNames(c(rep(0,5),1),0:5)
prior
## 0 1 2 3 4 5 
## 0 0 0 0 0 1
smapsII<-make.simmap(tree,x,nsim=100,model=model,pi=prior)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##           0         1         2         3         4         5
## 0 -0.540027 0.540027 0.000000 0.000000 0.000000 0.000000
## 1 1.274384 -1.814411 0.540027 0.000000 0.000000 0.000000
## 2 0.000000 1.274384 -1.814411 0.540027 0.000000 0.000000
## 3 0.000000 0.000000 1.274384 -1.814411 0.540027 0.000000
## 4 0.000000 0.000000 0.000000 1.274384 -1.814411 0.540027
## 5 0.000000 0.000000 0.000000 0.000000 1.274384 -1.274384
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## 0 1 2 3 4 5 
## 0 0 0 0 0 1
## Done.
obj<-summary(smapsII)
obj
## 100 trees with a mapped discrete character with states:
## 0, 1, 2, 3, 4, 5
##
## trees have 59.41 changes between states on average
##
## changes are of the following types:
## 0,1 0,2 0,3 0,4 0,5 1,0 1,2 1,3 1,4 1,5 2,0 2,1 2,3 2,4 2,5 3,0
## x->y 2.26 0 0 0 0 5.58 1.14 0 0 0 0 5.98 1.48 0 0 0
## 3,1 3,2 3,4 3,5 4,0 4,1 4,2 4,3 4,5 5,0 5,1 5,2 5,3 5,4
## x->y 0 9.47 3.55 0 0 0 0 10.96 5.03 0 0 0 0 13.96
##
## mean total time spent in each state is:
## 0 1 2 3 4 5
## raw 5.1728761 4.5440518 3.68522497 5.1720941 9.6365903 11.0506463
## prop 0.1317545 0.1157382 0.09386362 0.1317346 0.2454464 0.2814628
## total
## raw 39.26148
## prop 1.00000
plot(obj,colors=colors,ylim=c(-1,Ntip(smapsII[[1]])),ftype="off")
add.simmap.legend(x=0,y=0,prompt=FALSE,colors=colors,
vertical=FALSE)

plot of chunk unnamed-chunk-4

NIMBioS Evolutionary Quantitative Genetic Tutorial

$
0
0

It is the final day of what was once the NESCent tutorial on Evolutionary Quantitative Genetics but has since become the NIMBioS tutorial on Evolutionary Quantitative Genetics - co-organized, in all its iterations, by Joe Felsenstein& Steve Arnold. For the past four years I have been a 'guest instructor' in this program. My material for this year's edition is now finished & can be viewed online here and can be seen by clicking through for the links to Lecture 5.2: Ancestral state reconstruction, Computer Exercise 5.2: Ancestral state reconstruction and visual display of states, and Computer exercise 6.1: Threshold characters and continuous characters. Check it out! I believe video of my lecture, and the lectures of all other workshop instructors, will also be posted online at some point in the future & should be linked from the course site.

BTW, inspired by my computer exercise on threshold characters, the figure above was produced using the phytools function bmPlot. For instance:

library(phytools)
tree<-pbtree(n=40,scale=1)
par(mfcol=c(1,2))
par(lwd=2)
par(mar=c(4.1,4.1,2.1,2.1))
obj<-bmPlot(tree,type="threshold",thresholds=c(0,1))
plotSimmap(obj$tree,colors=setNames(c("black","red","blue"),
letters[1:3]),ftype="off",lwd=4,mar=c(4.1,0.1,2.1,0.1))

plot of chunk unnamed-chunk-1

More updates associated with mapped discrete state ("simmap") object class

$
0
0

In the past few days I have been busy doing things like adding S3 methods for the object class "simmap" that I have now created for the myriad of functions in phytools that create or use modified "phylo" objects of this type. I have also posted a new version of phytools with these methods & other updates & bug-fixes associated with the new object class for a variety of functions in the phytools package. I'm hoping to get this version on CRAN & relatively bug-free in time for my workshop next week at Universidad Nacional Autónoma de Mexico (UNAM) in Mexico City next week.

Here is a little bit of a preview:

install.packages("http://www.phytools.org/nonstatic/phytools_0.4-99.tar.gz",
type="source",repos=NULL)
## Installing package into 'C:/Users/Liam/Documents/R/win-library/3.2'
## (as 'lib' is unspecified)
library(phytools)
packageVersion("phytools")
## [1] '0.4.99'
## load "simmap" style tree from package data
data(anoletree)
## S3 print method
anoletree
## 
## Phylogenetic tree with 82 tips and 81 internal nodes.
##
## Tip labels:
## Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
##
## The tree includes a mapped, 6-state discrete character with states:
## CG, GB, TC, TG, Tr, Tw
##
## Rooted; includes branch lengths.
## summary method for object of class "simmap"
summary(anoletree)
## 1 tree with a mapped discrete character with states:
## CG, GB, TC, TG, Tr, Tw
##
## tree has 24 changes between states
##
## changes are of the following types:
## CG GB TC TG Tr Tw
## CG 0 0 0 0 0 1
## GB 1 0 0 0 0 0
## TC 1 0 0 0 0 2
## TG 2 6 4 0 1 3
## Tr 0 0 0 0 0 0
## Tw 0 0 1 1 1 0
##
## mean total time spent in each state is:
## CG GB TC TG Tr Tw
## raw 14.49251700 40.1094769 31.1538539 80.9012968 11.83957979 27.1702360
## prop 0.07046595 0.1950215 0.1514772 0.3933607 0.05756676 0.1321079
## total
## raw 205.667
## prop 1.000
## plot method
plot(anoletree,fsize=0.6,ftype="i",xlim=c(-0.18*max(nodeHeights(anoletree)),
1.22*max(nodeHeights(anoletree))))
## no colors provided. using the following legend:
## CG GB TC TG Tr Tw
## "black" "red" "green3" "blue" "cyan" "magenta"
states<-sort(unique(getStates(anoletree,"tips")))
add.simmap.legend(x=-0.18*max(nodeHeights(anoletree)),y=Ntip(anoletree),
colors=setNames(palette()[1:length(states)],states),prompt=FALSE)

plot of chunk unnamed-chunk-1

## do stochastic character mapping
x<-getStates(anoletree,"tips")
x
##            Anolis_ahli         Anolis_allogus     Anolis_rubribarbus 
## "TG" "TG" "TG"
## Anolis_imias Anolis_sagrei Anolis_bremeri
## "TG" "TG" "TG"
## Anolis_quadriocellifer Anolis_ophiolepis Anolis_mestrei
## "TG" "GB" "TG"
##
## ......
##
## Anolis_singularis Anolis_chlorocyanus Anolis_coelestinus
## "TC" "TC" "TC"
## Anolis_occultus
## "Tw"
maps<-make.simmap(anoletree,x,nsim=100,model="ER")
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##             CG          GB          TC          TG          Tr          Tw
## CG -0.11570723 0.02314145 0.02314145 0.02314145 0.02314145 0.02314145
## GB 0.02314145 -0.11570723 0.02314145 0.02314145 0.02314145 0.02314145
## TC 0.02314145 0.02314145 -0.11570723 0.02314145 0.02314145 0.02314145
## TG 0.02314145 0.02314145 0.02314145 -0.11570723 0.02314145 0.02314145
## Tr 0.02314145 0.02314145 0.02314145 0.02314145 -0.11570723 0.02314145
## Tw 0.02314145 0.02314145 0.02314145 0.02314145 0.02314145 -0.11570723
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## Done.
maps
## 100 phylogenetic trees with mapped discrete characters
print(maps,details=TRUE)
## 100 phylogenetic trees with mapped discrete characters
## tree 1 : 82 tips, 6 mapped states
## tree 2 : 82 tips, 6 mapped states
## tree 3 : 82 tips, 6 mapped states
## tree 4 : 82 tips, 6 mapped states
## tree 5 : 82 tips, 6 mapped states
## tree 6 : 82 tips, 6 mapped states
## ....
## tree 97 : 82 tips, 6 mapped states
## tree 98 : 82 tips, 6 mapped states
## tree 99 : 82 tips, 6 mapped states
## tree 100 : 82 tips, 6 mapped states
obj<-summary(maps)
plot(obj,fsize=0.6,ftype="i",xlim=c(-0.18*max(nodeHeights(anoletree)),
1.22*max(nodeHeights(anoletree))))
add.simmap.legend(x=-0.18*max(nodeHeights(anoletree)),y=Ntip(anoletree),
colors=setNames(palette()[1:length(states)],states),prompt=FALSE)

plot of chunk unnamed-chunk-2

Or, alternatively:

plot(anoletree,fsize=0.6,ftype="i",xlim=c(-0.18*max(nodeHeights(anoletree)),
1.22*max(nodeHeights(anoletree))))
## no colors provided. using the following legend:
## CG GB TC TG Tr Tw
## "black" "red" "green3" "blue" "cyan" "magenta"
states<-sort(unique(getStates(anoletree,"tips")))
add.simmap.legend(x=-0.18*max(nodeHeights(anoletree)),y=Ntip(anoletree),
colors=setNames(palette()[1:length(states)],states),prompt=FALSE)
nodelabels(pie=obj$ace,piecol=setNames(palette()[1:length(states)],states),
cex=0.6)

plot of chunk unnamed-chunk-3

The same methods also apply to "simmap" class objects created by other methods. For instance:

tree<-pbtree(n=50,scale=100)
limits=c(0,25,50,75)
tree<-make.era.map(tree,limits)
tree
## 
## Phylogenetic tree with 50 tips and 49 internal nodes.
##
## Tip labels:
## t1, t33, t34, t20, t21, t16, ...
##
## The tree includes a mapped, 4-state discrete character with states:
## 1, 2, 3, 4
##
## Rooted; includes branch lengths.
plot(tree,fsize=0.7,mar=c(5.1,0.1,0.1,0.1))
## no colors provided. using the following legend:
## 1 2 3 4
## "black" "red" "green3" "blue"
axis(1)
title(xlab="time")
obj<-lapply(limits[2:length(limits)],function(x) lines(rep(x,2),
par()$usr[3:4],lty="dashed"))

plot of chunk unnamed-chunk-4

or:

tree<-pbtree(n=26,tip.label=LETTERS)
tree<-paintSubTree(tree,29,"b","a",stem=0.5)
tree<-paintSubTree(tree,35,"c",stem=0.5)
colors<-setNames(c("black","blue","red"),letters[1:3])
plot(tree,colors)
xy<-markChanges(tree,plot=FALSE)
points(xy,pch=8)

plot of chunk unnamed-chunk-5

or:

## simulate data with multiple rates
x<-sim.rates(tree,setNames(c(1,10,20),letters[1:3]))
obj<-contMap(tree,x,plot=FALSE)
obj ## object of class "contMap"
## Object of class "contMap" containing:
##
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
##
## (2) A mapped continuous trait on the range (-12.257807, 1.528703).
plot(obj,sig=2)
xy<-markChanges(tree,plot=FALSE)
points(xy,pch=21,cex=1.5,bg="grey")

plot of chunk unnamed-chunk-6

obj$tree ## internal object of class "simmap"
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## The tree includes a mapped, 44-state discrete character with states:
## 198, 200, 218, 338, 366, 382, ...
##
## Rooted; includes branch lengths.
plot(obj$tree,colors=obj$cols,lwd=4)

plot of chunk unnamed-chunk-6

More on this later!

Clarification & update to the phytools function treeSlice

$
0
0

Just to clarify how the function treeSlice, which bisects the tree at a particular height above the root and returns all (non-trivial or trivial & non-trivial) subtrees, is supposed to work, I thought I'd give the following quick example:

library(phytools)
set.seed(630)
tree<-rtree(n=26)
tree$tip.label<-LETTERS
plotTree(tree,mar=c(3.1,rep(0.1,3)))
axis(1)
slice<-2
## mark the places where the tree should be sliced
lines(rep(slice,2),par()$usr[3:4],lty="dashed")
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
X<-cbind(obj$xx[obj$edge[,1]],obj$xx[obj$edge[,2]])
y<-obj$yy[obj$edge[,2]]
for(i in 1:nrow(X)) if(X[i,1]<slice&&X[i,2]>slice)
points(slice,y[i],pch=19,col="red")

plot of chunk unnamed-chunk-1

From this we can see that the tree has 9 subtrees, and these subtrees should containg 9, 2, 2, 1, 1, 2, 1, 2, and 1 taxa respectively. The important point to note is that because the tree is not ultrametric, some of the taxa in the original tree are left out of the extracted subtrees.

subtrees<-treeSlice(tree,slice,trivial=TRUE)
print(subtrees,details=TRUE)
## 9 phylogenetic trees
## tree 1 : 9 tips
## tree 2 : 2 tips
## tree 3 : 2 tips
## tree 4 : 1 tips
## tree 5 : 1 tips
## tree 6 : 2 tips
## tree 7 : 1 tips
## tree 8 : 2 tips
## tree 9 : 1 tips
## tip labels
null<-lapply(subtrees,function(x) print(x$tip.label))
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I"
## [1] "J" "K"
## [1] "L" "M"
## [1] "O"
## [1] "P"
## [1] "Q" "R"
## [1] "S"
## [1] "U" "V"
## [1] "Z"

For fun, we could try to make a version of treeSlice in which the height of the cut is (optionally) supplied interactively by the user. Here is what that might look like:

treeSlice<-function(tree,slice=NULL,trivial=FALSE,prompt=FALSE,...){
if(!inherits(tree,"phylo")) stop("tree should be an object of class \"phylo\".")
if(prompt){
plotTree(tree,mar=c(3.1,rep(0.1,3)),...)
axis(1)
cat("Click at the tree height where cutting is desired...\n")
flush.console()
xy<-unlist(locator(1))
slice<-xy[1]
cat(paste("Slice height is ",signif(slice,6),". Thank you!\n",sep=""))
flush.console()
lines(rep(slice,2),par()$usr[3:4],lty="dashed")
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
X<-cbind(obj$xx[obj$edge[,1]],obj$xx[obj$edge[,2]])
y<-obj$yy[obj$edge[,2]]
if(trivial){
for(i in 1:nrow(X)) if(X[i,1]<slice&&X[i,2]>slice)
points(slice,y[i],pch=19)
} else {
for(i in 1:nrow(X))
if(X[i,1]<slice&&X[i,2]>slice&&obj$edge[i,2]>Ntip(tree))
points(slice,y[i],pch=19)
}
}
tree<-reorder(tree) # reorder cladewise
H<-nodeHeights(tree)
edges<-which(H[,2]>slice&H[,1]<slice)
nodes<-tree$edge[edges,2]
if(!trivial) nodes<-nodes[nodes>length(tree$tip)]
trees<-list()
class(trees)<-"multiPhylo"
for(i in 1:length(nodes)){
if(nodes[i]>Ntip(tree)){
trees[[i]]<-extract.clade(tree,node=nodes[i])
trees[[i]]$root.edge<-H[which(tree$edge[,2]==nodes[i]),2]-slice
} else {
z<-list(edge=matrix(c(2,1),1,2),
edge.length=H[which(tree$edge[,2]==nodes[i]),2]-slice,
tip.label=tree$tip.label[nodes[i]],Nnode=1L)
class(z)<-"phylo"
trees[[i]]<-z
}
}
return(trees)
}

Try it!

Logo for UNAM course in R

$
0
0

I will be teaching a 4-day workshop at Universidad Nacional Autónoma de Mexico (UNAM) in Mexico City with Alejandro Gonzalez-Voyer next week. Here is our logo design for the course (generated in R):

library(phytools)
## Loading required package: ape
## Loading required package: maps
set.seed(499)
tips1<-strsplit("Latin American","")[[1]]
tips2<-strsplit("Macroevolution Workshop","")[[1]]
tr1<-rtree(n=length(tips1))
tr1$tip.label<-tips1[length(tips1):1]
tr2<-rtree(n=length(tips2))
tr2$tip.label<-tips2[length(tips2):1]
par(bg="black")
par(fg="white")
assoc<-cbind(
c("L","t","e","A","r","c","c","m"),
c("a","c","v","l","u","W","i","p"))
plot.new()
par(font=2)
obj<-cophylo(tr1,tr2,assoc=assoc,rotate=FALSE)
layout(mat=matrix(c(1,2),2,1),heights=c(0.8,0.2))
plot(obj,fsize=1.4,lwd=3,ftype="b")
plot.new()
text(0.5,0.5,"Universidad Nacional Autónoma de México\nAugust 2015",
col="white",cex=1.6,font=2)

plot of chunk unnamed-chunk-1

That's it!

Fix to default colors in phenogram

$
0
0
https://raw.githubusercontent.com/liamrevell/phytools/master/R/phenogram.R

A phytools reader reported the following issue in using the phytools traitgram function, phenogram, to plot a tree with a mapped discrete character:

library(phytools)
## Loading required package: ape
## Loading required package: maps
tree<-pbtree(n=20,scale=2)
Q=matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-LETTERS[1:2]
x1<-sim.history(tree,Q,anc="A")$states
## Done simulation(s).
x1
## t14 t15  t9  t1 t19 t20 t18 t10 t12 t13 t11 t16 t17  t4  t2  t3  t5  t6 
## "A" "B" "B" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "B" "A" "A"
## t7 t8
## "A" "A"
y<-fastBM(tree)
t1<-make.simmap(tree,x1)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##            A          B
## A -0.2959494 0.2959494
## B 0.2959494 -0.2959494
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##   A   B 
## 0.5 0.5
## Done.
phenogram(t1,y)

plot of chunk unnamed-chunk-1

By contrast, if we use a numerically indexed discrete character, we don't face the same issue:

x2<-sapply(x1,function(x) if(x=="A") "1" else "2")
t2<-make.simmap(tree,x2)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##            1          2
## 1 -0.2959494 0.2959494
## 2 0.2959494 -0.2959494
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##   1   2 
## 0.5 0.5
## Done.
phenogram(t2,y)

plot of chunk unnamed-chunk-2

The issue is not that deep. In fact, it can be circumvented if we are willing to just use the argument colors (as we would in plotSimmap) to set the colors of our mapped discrete trait. E.g.,

colors<-setNames(c("blue","red"),LETTERS[1:2])
colors
##      A      B 
## "blue" "red"
phenogram(t1,y,colors=colors)

plot of chunk unnamed-chunk-3

I have nonetheless fixed this issue so that we will get the mapped discrete state colored by default if no colors are specified.

source("https://raw.githubusercontent.com/liamrevell/phytools/master/R/phenogram.R")
phenogram(t1,y)

plot of chunk unnamed-chunk-4

phytools is now on GitHub, so in fact this update can be installed automatically in a fresh R session using the devtools package as follows:

## don't run
library(devtools)
install_github("liamrevell/phytools")

That's all folks.


phytools can now be installed from GitHub

$
0
0

phytools is now on GitHub and the development version can be installed using devtools.

library(devtools)
install_github("liamrevell/phytools")
## Downloading github repo liamrevell/phytools@master
## Installing phytools
## * installing *source* package 'phytools' ...
## ** R
## ** data
## ** inst
## ** preparing package for lazy loading
## ** help
## *** installing help indices
## ** building package indices
## ** testing if installed package can be loaded
## *** arch - i386
## *** arch - x64
## * DONE (phytools)
library(phytools)
## Loading required package: ape
## Loading required package: maps

That's all there is to it.

densityMap with non-binary trait data

$
0
0

I (not-so) recently go the following question about the phytools function densityMap:

Is it possible to make same as your method 1 'Visualizing the aggregate result of stochastic mapping, in your paper (2013) : “Two new graphical methods for mapping trait evolution on phylogenies”, for a discrete trait which is not a binary trait?

Presently densityMap, which creates a visualization of the posterior probability density from stochastic mapping of a binary character on the tree, does not work for any more than two characters. This is mostly because it is hard to envision visualizing color in more than one dimension! (One possibly for a three-state character would be to put the probabilities on a 3D RGB scale - but this adds only one dimension to the analysis anyway.)

The only solution right now is to use mergeMappedStates on the "simmap" trees & then plot "A" vs. "not A", "B" vs. "not B", "C" vs. "not C", etc.

Here is a quick demo of how to do that, which would have to be varied depending on the specific character states that were being modeled:

library(phytools)
packageVersion("phytools")
## [1] '0.4.99'
## simulate some data
Q<-matrix(c(-1,0.5,0.5,
0.5,-1,0.5,
0.5,0.5,-1),3,3)
rownames(Q)<-colnames(Q)<-LETTERS[1:3]
## true history
tree<-sim.history(pbtree(n=50,scale=1),Q,anc="A")
## Done simulation(s).
tree
## 
## Phylogenetic tree with 50 tips and 49 internal nodes.
##
## Tip labels:
## t32, t33, t23, t13, t14, t7, ...
##
## The tree includes a mapped, 3-state discrete character with states:
## A, B, C
##
## Rooted; includes branch lengths.
plot(tree,fsize=0.8)
## no colors provided. using the following legend:
## A B C
## "black" "red" "green3"

plot of chunk unnamed-chunk-1

## simulated tip data
x<-getStates(tree,"tips")
trees<-make.simmap(tree,x,nsim=100,model="ER")
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##            A          B          C
## A -0.8584049 0.4292025 0.4292025
## B 0.4292025 -0.8584049 0.4292025
## C 0.4292025 0.4292025 -0.8584049
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##         A         B         C 
## 0.3333333 0.3333333 0.3333333
## Done.
trees
## 100 phylogenetic trees with mapped discrete characters
print(trees[1:10],details=TRUE)
## 10 phylogenetic trees with mapped discrete characters
## tree 1 : 50 tips, 3 mapped states
## tree 2 : 50 tips, 3 mapped states
## tree 3 : 50 tips, 3 mapped states
## tree 4 : 50 tips, 3 mapped states
## tree 5 : 50 tips, 3 mapped states
## tree 6 : 50 tips, 3 mapped states
## tree 7 : 50 tips, 3 mapped states
## tree 8 : 50 tips, 3 mapped states
## tree 9 : 50 tips, 3 mapped states
## tree 10 : 50 tips, 3 mapped states
## merge B & C
tBC<-lapply(trees,mergeMappedStates,c("B","C"),"B or C")
class(tBC)<-c("multiSimmap","multiPhylo")
print(tBC[1:10],details=TRUE)
## 10 phylogenetic trees with mapped discrete characters
## tree 1 : 50 tips, 2 mapped states
## tree 2 : 50 tips, 2 mapped states
## tree 3 : 50 tips, 2 mapped states
## tree 4 : 50 tips, 2 mapped states
## tree 5 : 50 tips, 2 mapped states
## tree 6 : 50 tips, 2 mapped states
## tree 7 : 50 tips, 2 mapped states
## tree 8 : 50 tips, 2 mapped states
## tree 9 : 50 tips, 2 mapped states
## tree 10 : 50 tips, 2 mapped states
obj<-densityMap(tBC,plot=FALSE,states=c("B or C","A"))
## sorry - this might take a while; please be patient
obj
## Object of class "densityMap" containing:
##
## (1) A phylogenetic tree with 50 tips and 49 internal nodes.
##
## (2) The mapped posterior density of a discrete binary character with states (B or C, A).
plot(obj,outline=TRUE,lwd=4,fsize=c(0.7,1))

plot of chunk unnamed-chunk-2

Providing the argument states=c("B or C","A") is an important undocumented detail here, because otherwise branches of high posterior probability would be high posterior probability of "B or C"!

Similarly:

tAC<-lapply(trees,mergeMappedStates,c("A","C"),"A or C")
class(tAC)<-c("multiSimmap","multiPhylo")
obj<-densityMap(tAC,plot=FALSE,states=c("A or C","B"))
## sorry - this might take a while; please be patient
obj
## Object of class "densityMap" containing:
##
## (1) A phylogenetic tree with 50 tips and 49 internal nodes.
##
## (2) The mapped posterior density of a discrete binary character with states (A or C, B).
plot(obj,outline=TRUE,lwd=4,fsize=c(0.7,1))

plot of chunk unnamed-chunk-3

tAB<-lapply(trees,mergeMappedStates,c("A","B"),"A or B")
class(tAB)<-c("multiSimmap","multiPhylo")
obj<-densityMap(tAB,plot=FALSE,states=c("A or B","C"))
## sorry - this might take a while; please be patient
obj
## Object of class "densityMap" containing:
##
## (1) A phylogenetic tree with 50 tips and 49 internal nodes.
##
## (2) The mapped posterior density of a discrete binary character with states (A or B, C).
plot(obj,outline=TRUE,lwd=4,fsize=c(0.7,1))

plot of chunk unnamed-chunk-3

Thanks for the question!

Removing node labels from a Newick string

$
0
0

Today on R-sig-phylo:

"I'm currently wanting to make some changes to some phylogenies in R by reading in the newick text as a string, rather than as a phylo object. The reason is that the trees are large (~10,000 tips and another set with ~5000 tips) and I must make changes to a complete set of these tree (i.e. 10,000 trees). Currently, these trees have node labels, which I'd like to remove.

"Essentially what I'd like to do is substitute ")?:" with "):" or simply delete "?", where "?" is any and all characters that occur between the 'close' parenthesis and the colon. So far I found I could use the function 'sub' but I'd like to make the replacements in one fell swoop, without knowing what the node labels are in advance. Also the sub function seems to only replace the first occurrence of the pattern rather than all matches in a string. Any suggestions would be greatly appreciated!"

Here's one solution using strsplit:

## generate a Newick string
library(phytools)
tree<-rtree(n=10)
## just so we can look at the Newick string more easily
tree$edge.length<-round(tree$edge.length,2)
tree$node.label<-paste("node",1:9,sep="")
tree
## 
## Phylogenetic tree with 10 tips and 9 internal nodes.
##
## Tip labels:
## t9, t3, t1, t7, t2, t6, ...
## Node labels:
## node1, node2, node3, node4, node5, node6, ...
##
## Rooted; includes branch lengths.
text<-write.tree(tree)
text
## [1] "((t9:0.13,t3:0.18)node2:0.13,((t1:0.96,(t7:0.91,t2:0.4)node5:0.88)node4:0.19,(((t6:0.24,(t5:0.03,t8:0.18)node9:0.48)node8:0.13,t10:0.71)node7:0.84,t4:0.7)node6:0.89)node3:0.74)node1;"
strip.nodelabels<-function(text){
obj<-strsplit(text,"")[[1]]
cp<-grep(")",obj)
csc<-c(grep(":",obj),length(obj))
exc<-cbind(cp,sapply(cp,function(x,y) y[which(y>x)[1]],y=csc))
exc<-exc[(exc[,2]-exc[,1])>1,]
inc<-rep(TRUE,length(obj))
if(nrow(exc)>0) for(i in 1:nrow(exc))
inc[(exc[i,1]+1):(exc[i,2]-1)]<-FALSE
paste(obj[inc],collapse="")
}
strip.nodelabels(text)
## [1] "((t9:0.13,t3:0.18):0.13,((t1:0.96,(t7:0.91,t2:0.4):0.88):0.19,(((t6:0.24,(t5:0.03,t8:0.18):0.48):0.13,t10:0.71):0.84,t4:0.7):0.89):0.74);"

It even works fine if some node labels are missing:

tree$node.label[c(2,4,6)]<-""
text<-write.tree(tree)
text
## [1] "((t9:0.13,t3:0.18):0.13,((t1:0.96,(t7:0.91,t2:0.4)node5:0.88):0.19,(((t6:0.24,(t5:0.03,t8:0.18)node9:0.48)node8:0.13,t10:0.71)node7:0.84,t4:0.7):0.89)node3:0.74)node1;"
strip.nodelabels(text)
## [1] "((t9:0.13,t3:0.18):0.13,((t1:0.96,(t7:0.91,t2:0.4):0.88):0.19,(((t6:0.24,(t5:0.03,t8:0.18):0.48):0.13,t10:0.71):0.84,t4:0.7):0.89):0.74);"

We can see how it does for large trees:

tree<-rtree(n=5000)
tree$node.label<-paste("node",1:4999,sep="")
tree
## 
## Phylogenetic tree with 5000 tips and 4999 internal nodes.
##
## Tip labels:
## t176, t515, t4374, t3812, t4823, t3087, ...
## Node labels:
## node1, node2, node3, node4, node5, node6, ...
##
## Rooted; includes branch lengths.
text<-write.tree(tree)
system.time(text<-strip.nodelabels(text))
##    user  system elapsed 
## 1.00 0.01 1.11

Not super fast. Compare it to reading the tree, setting node labels to NULL, and then writing back to text string:

foo<-function(text){
tree<-read.tree(text=text)
tree$node.label<-NULL
write.tree(tree)
}
text<-write.tree(tree)
system.time(text<-foo(text))
##    user  system elapsed 
## 1.69 0.04 1.82

So, it's about twice as fast or so.

Adding a tip to a sister taxon on the tree

$
0
0

Here's a quick primer on how to attach a tip not in our tree to it's putative sister taxon half-way along the terminal edge length to its sister taxon.

## load phytools & simulate 
library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
plotTree(tree)

plot of chunk unnamed-chunk-1

## add tip "A2" to sister taxon "A"
tip<-"A2"
sister<-"A"
tree<-bind.tip(tree,tip,where=which(tree$tip.label==sister),
position=0.5*tree$edge.length[which(tree$edge[,2]==
which(tree$tip.label==sister))])
## add tip "T2" to "T"
tip<-"T2"
sister<-"T"
tree<-bind.tip(tree,tip,where=which(tree$tip.label==sister),
position=0.5*tree$edge.length[which(tree$edge[,2]==
which(tree$tip.label==sister))])
plotTree(tree)

plot of chunk unnamed-chunk-2

This works with ultrametric trees. If our trees are not ultrametric, then we also have to include the terminal edge length that we want to have for our tip to be added in each case.

OK, we've done it!

New S3 plot method for rateshift

$
0
0

I just added a new S3 plotting method for objects of class "rateshift" created by the phytools function rateshift.

This function attempts to fit different Brownian rates of evolutionary change to a user specified number of different 'eras' across the history of the user tree, without specifying when each era should begin & end.

The method is quite simple - and probably interesting to some users - but as I have never published an article describing or using this approach, it probably ranks among the neglectedmethods of the phytools package.

The new plot method visualizes the different fitted eras on the tree, and includes a color gradient for the fitted rates.

Here's a quick (or, not so quick, if you try to run it) example of what I mean:

library(devtools)
install_github("liamrevell/phytools",quiet=TRUE)
library(phytools)
## Loading required package: ape
## Loading required package: maps

First, let's simulate some data with strong rate shifts over time:

set.seed(99)
tree<-pbtree(n=50,scale=100)
x<-sim.rates(tree<-make.era.map(tree,c(0,60,85)),
setNames(c(5,20,1),1:3))
## these are our simulated regimes:
plot(tree,setNames(c("purple","red","blue"),1:3),fsize=0.8,ftype="i",
mar=c(3.1,0.1,0.1,0.1))
axis(1)

plot of chunk unnamed-chunk-2

Now let's fit our different models:

## one rate model (the first two models are easy to fit)
fit1<-rateshift(tree,x,niter=1)
## Optimizing. Please wait.
fit1
## ML 1-rate model:
## s^2(1) se(1) k logL
## value 7.4779 1.4956 2 -206.1193
##
## This is a one-rate model.
##
## R thinks it has found the ML solution.
plot(fit1,fsize=0.7,ftype="i",mar=c(3.1,0.1,0.1,0.1),lwd=3)
axis(1)

plot of chunk unnamed-chunk-3

## two rate model
fit2<-rateshift(tree,x,nrates=2,niter=1)
## Optimizing. Please wait.
fit2
## ML 2-rate model:
## s^2(1) se(1) s^2(2) se(2) k logL
## value 13.8709 3.3576 0.526 0.2121 4 -193.311
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2)
## value 88.4271 0.0436
##
## R thinks it has found the ML solution.
plot(fit2,fsize=0.7,ftype="i",mar=c(3.1,0.1,0.1,0.1),lwd=3)
axis(1)

plot of chunk unnamed-chunk-3

## three rate model (this was our generating model)
fit3<-rateshift(tree,x,nrates=3,niter=50)
## Optimization progress:
## |..................................................|
## Done.
fit3
## ML 3-rate model:
## s^2(1) se(1) s^2(2) se(2) s^2(3) se(3) k logL
## value 3e-04 NaN 17.1992 5.3099 0.5412 0.2269 6 -190.4806
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2) 2|3 se(2|3)
## value 48.8427 11.3994 88.4255 0.0819
##
## Optimization may not have converged.
plot(fit3,fsize=0.7,ftype="i",mar=c(3.1,0.1,0.1,0.1),lwd=3)
axis(1)

plot of chunk unnamed-chunk-3

## four rate model
fit4<-rateshift(tree,x,nrates=4,niter=50)
## Optimization progress:
## |..................................................|
## Done.
fit4
## ML 4-rate model:
## s^2(1) se(1) s^2(2) se(2) s^2(3) se(3) s^2(4) se(4) k logL
## value 0.0021 NaN 29.1904 12.3032 6.4966 4.7628 0.5255 0.209 8 -189.4281
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2) 2|3 se(2|3) 3|4 se(3|4)
## value 53.5568 6.0482 74.1195 0.2341 89.9531 3.9633
##
## Optimization may not have converged.
plot(fit4,fsize=0.7,ftype="i",mar=c(3.1,0.1,0.1,0.1),lwd=3)
axis(1)

plot of chunk unnamed-chunk-3

The parameter estimates for the three-rate model are quite close - at least in terms of the estimated shift points - to their generating values, which is cool.

We can also compute & compare AIC scores for each model:

aics<-setNames(c(AIC(fit1),AIC(fit2),AIC(fit3),AIC(fit4)),
paste(1:4,"-rate",sep=""))
aics
##   1-rate   2-rate   3-rate   4-rate 
## 416.2387 394.6220 392.9611 394.8562

We see that, in addition to having estimated shift points close to the simulated values, the generating, three-rate model has the best support - although it seems to be quite difficult to achieve convergence for the models with more shift point…. Since the S3 methods are so nice, maybe I should work a little harder at getting the optimization to actually work well!

Primer on estimating ancestral states (& tip values) when some tips are unknown

$
0
0

A couple of days ago I received the following question:

“Given a phylogeny and a single binary trait with missing values for some species, I want to predict the value of the missing species. How might you approach this? Thanks!”

Indeed, there are functions in phytools that will predict tip states for both continuous & discretely valued character data. Here is a quick primer.

First, for continuous characters. In this case, the estimates will just be the interpolated states along the edges for the missing tips (since the expected change under BM is zero):

## simulate tree & data
x<-fastBM(tree<-pbtree(n=26,tip.label=LETTERS,scale=1))
## simulate missing data
x.sampled<-sample(x,20)
x.sampled
##           A           R           K           E           L           Y 
## 0.09978808 0.04495263 -0.46726255 1.50418672 -0.43260754 -1.45159378
## B T V N O C
## 1.81665795 0.91084994 0.85232811 -0.13158690 -0.24637438 1.07818611
## S Z G W F J
## -1.09168692 -1.57777654 -0.16152377 1.10174020 0.79768254 1.08410617
## X H
## -1.83383452 1.69266010
## estimate ancestral states for missing data.
fit<-anc.ML(tree,x.sampled)
fit
## $sig2
##
## 0.6468956
##
## $ace
## 27 28 29 30 31 32
## 0.4417079 0.4843495 0.7462708 1.3442513 0.6777013 0.9243309
## 33 34 35 36 37 38
## 0.9868763 0.8782826 0.8732871 0.8705345 -0.2067894 -0.2468859
## 39 40 41 42 43 44
## -0.2728397 -0.3430832 -0.4432120 -0.1923593 -0.2617900 -0.2617974
## 45 46 47 48 49 50
## -0.3003078 -0.2182230 0.5498239 0.8936666 0.8909877 -1.2984536
## 51
## -1.5445927
##
## $logLik
## [1] -4.705934
##
## $counts
## function gradient
## 155 155
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
##
## $model
## [1] "BM"
##
## $missing.x
## D I M P Q U
## 0.9243847 0.8733192 -0.3431066 -0.2617996 -0.2617996 0.8936641
##
## attr(,"class")
## [1] "anc.ML"

Now, for discrete characters. In this case, a binary trait:

## simulate some data
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-0:1
y<-as.factor(sim.history(tree,Q)$states)
## Done simulation(s).
y.sampled<-sample(y,20)
y.sampled
## K W A T Q U Z H O B F G X P S Y N I R E 
## 0 0 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 1
## Levels: 0 1
## now create a matrix with flat prior probabilities for 
## unknown tips
Pr<-to.matrix(y.sampled,seq=levels(y.sampled))
tips<-setdiff(tree$tip.label,rownames(Pr)) ## missing tips
tips
## [1] "C" "D" "J" "L" "M" "V"
Pr<-rbind(Pr,matrix(1/length(levels(y.sampled)),
length(tips),2,dimnames=list(tips)))
Pr
##     0   1
## K 1.0 0.0
## W 1.0 0.0
## A 0.0 1.0
## T 1.0 0.0
## Q 0.0 1.0
## U 1.0 0.0
## Z 1.0 0.0
## H 1.0 0.0
## O 1.0 0.0
## B 0.0 1.0
## F 0.0 1.0
## G 0.0 1.0
## X 1.0 0.0
## P 0.0 1.0
## S 1.0 0.0
## Y 1.0 0.0
## N 1.0 0.0
## I 0.0 1.0
## R 1.0 0.0
## E 0.0 1.0
## C 0.5 0.5
## D 0.5 0.5
## J 0.5 0.5
## L 0.5 0.5
## M 0.5 0.5
## V 0.5 0.5
fit<-rerootingMethod(tree,Pr,model="ER")
plotTree(tree)
nodelabels(pie=fit$marginal.anc[as.character(1:tree$Nnode+Ntip(tree)),],
piecol=c("blue","red"),cex=0.6)
tiplabels(pie=fit$marginal.anc[tree$tip.label,],piecol=c("blue","red"),
cex=0.4)

plot of chunk unnamed-chunk-2

This only works for symmetrical models. We can also use make.simmap for symmetrical or assymetrical transition matrices:

trees<-make.simmap(tree,Pr,model="ARD",nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##            0          1
## 0 -0.4833617 0.4833617
## 1 0.6311203 -0.6311203
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##   0   1 
## 0.5 0.5
## Done.
obj<-summary(trees)
plot(obj,colors=setNames(c("blue","red"),levels(y.sampled)))

plot of chunk unnamed-chunk-3

That's it.

Bug fix in cophylo

$
0
0

A phytools user (Eliot Miller) reported some strange behavior in the function for co-phylogenetic plotting, cophylo, which can be replicated with the following example:

library(phytools)
tr1<-rtree(n=26,tip.label=LETTERS)
tr2<-rtree(n=26,tip.label=LETTERS)
obj<-cophylo(tr1,tr2)
## Rotating nodes to optimize matching...
## Done.
plot(obj) ## works

plot of chunk unnamed-chunk-1

## read & write to NEXUS tree file:
write.nexus(c(tr1,tr2),file="nexus.trees")
trees<-read.nexus(file="nexus.trees")
obj<-cophylo(trees[[1]],trees[[2]])
## Rotating nodes to optimize matching...
## Done.
plot(obj) ## fails

plot of chunk unnamed-chunk-1

This is because the NEXUS trees use a TRANSLATE tabel and consequently the order of the tip labels is given by a single vector for all trees, and is thus not consistenly 'cladewise' across trees (even if the rows of edge are themselves cladewise.

This can be seen using:

str(trees)
## Class "multiPhylo"
## List of 2
## $ UNTITLED:List of 3
## ..$ edge : int [1:50, 1:2] 27 28 29 29 30 31 31 30 28 32 ...
## ..$ edge.length: num [1:50] 0.0507 0.6907 0.4726 0.7195 0.6709 ...
## ..$ Nnode : int 25
## ..- attr(*, "class")= chr "phylo"
## ..- attr(*, "order")= chr "cladewise"
## $ UNTITLED:List of 3
## ..$ edge : int [1:50, 1:2] 27 27 28 29 30 30 29 31 31 32 ...
## ..$ edge.length: num [1:50] 0.0255 0.5955 0.0601 0.5403 0.294 ...
## ..$ Nnode : int 25
## ..- attr(*, "class")= chr "phylo"
## ..- attr(*, "order")= chr "cladewise"
## - attr(*, "TipLabel")= chr [1:26] "T" "Q" "Y" "R" ...

We can fix this using the (classic), if a bit inelegant, read.tree/ write.tree hack - now added to the cophylo function (details here).

source("../phytools/R/cophylo.R")
trees<-read.nexus(file="nexus.trees")
obj<-cophylo(trees[[1]],trees[[2]])
## Rotating nodes to optimize matching...
## Done.
plot(obj) ## works?

plot of chunk unnamed-chunk-3

This fixed version can be installed from GitHub using devtools.

That's it for now.


Bug fix in drop.tip.densityMap

$
0
0

A phytools user reported the following peculiar bug with the phytools function drop.tip.densityMap:

## simulate some data
library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2,dimnames=list(letters[1:2],letters[1:2]))
x<-as.factor(sim.history(tree,Q)$states)
## Done simulation(s).
x
## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
## b b b a a b a b b b b b a a a a b a a b b b a a a b
## Levels: a b
## now replicate the error
trees<-make.simmap(tree,x,nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
##           a         b
## a -3.440867 3.440867
## b 3.440867 -3.440867
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##   a   b 
## 0.5 0.5
## Done.
obj<-densityMap(trees,plot=FALSE)
## sorry - this might take a while; please be patient
plot(obj,lwd=4,outline=TRUE)

plot of chunk unnamed-chunk-1

tips<-sample(tree$tip.label,5)
tips
## [1] "I" "U" "N" "G" "V"
obj<-drop.tip.densityMap(obj,tips)
## x should be an object of class "contMap"

I'm not sure when I introduced this bug, but it is now fixedand this will be in the next version of phytools (or if you update phytools from GitHub).

obj<-densityMap(trees,plot=FALSE)
## sorry - this might take a while; please be patient
source("../phytools/R/densityMap.R")
obj<-drop.tip.densityMap(obj,tips)
plot(obj,lwd=4,outline=TRUE)

plot of chunk unnamed-chunk-2

That's it.

Plotting stochastic-map tree with an outline; splitting vertical lines in a plotted tree by the mapped states of the daughters

$
0
0

Today the following comment was posted on the phytools blog comments page:

“Thanks for providing such great resources. I have a couple of questions that I cannot seem to find answers for (and hope I have not missed anything obvious).

Is it possible to outline branches in plotSimmap? I would like to use black, grey and white for simplicity, but only if I can see the white branches…

Also, I notice that the colour at the nodes (vertical branches on a square phylogeny) are the colour of just one of the tipwards branches (the bottom branch). Is it possible for the nodes to be coloured equally by the two branches?”

First question (Is it possible to outline branches in plotSimmap? I would like to use black, grey and white for simplicity, but only if I can see the white branches…). This is not too difficult, in fact. Here is a quick demo:

set.seed(13)
library(phytools)
## simulate some data:
Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2)/2,3,3,
dimnames=list(letters[1:3],letters[1:3]))
tree<-sim.history(pbtree(n=26,tip.label=LETTERS,scale=1),Q,anc="a")
## Done simulation(s).
plot(tree)
## no colors provided. using the following legend:
## a b c
## "black" "red" "green3"

plot of chunk unnamed-chunk-1

## now let's do it with an outline:
par(fg="transparent")
plotTree(tree,lwd=5,ylim=c(-1,Ntip(tree)))
par(fg="black")
colors=setNames(c("black","grey","white"),
sort(unique(getStates(tree,"tips"))))
colors
##       a       b       c 
## "black" "grey" "white"
plot(tree,lwd=3,colors=colors,add=TRUE,ylim=c(-1,Ntip(tree)))
add.simmap.legend(x=0.1*max(nodeHeights(tree)),y=0,colors=colors,
prompt=FALSE,vertical=FALSE)

plot of chunk unnamed-chunk-1

The second question was as follows: Also, I notice that the colour at the nodes (vertical branches on a square phylogeny) are the colour of just one of the tipwards branches (the bottom branch). Is it possible for the nodes to be coloured equally by the two branches?

Well, if the plotted tree is a stochastic map tree, then it is theoretically impossible that the character changes state precisely at a node. However, plotSimmap can be used to plot trees that are not stochastic map trees, including trees in which the edge states have been set using paintBranches and paintSubTree.

tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
plotTree(tree,node.numbers=TRUE)

plot of chunk unnamed-chunk-2

tree<-paintSubTree(tree,33,state="b",anc.state="a")
tree<-paintSubTree(tree,42,state="c",stem=TRUE)
colors<-setNames(c("black","blue","red"),letters[1:3])
plot(tree,colors,lwd=3)

plot of chunk unnamed-chunk-2

So here we see that the vertical lines connecting edges have one or the other state, rather than spltting between the two daughter edge.

The following is a custom function to split the edge color between the two daughters when (and only when) the state changes exactly at a node:

splitEdgeColor<-function(tree,colors,lwd=2){
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
for(i in 1:tree$Nnode+Ntip(tree)){
daughters<-tree$edge[which(tree$edge[,1]==i),2]
for(j in 1:length(daughters)){
jj<-which(tree$edge[,2]==daughters[j])
color<-if(tree$maps[[jj]][1]==0) colors[names(tree$maps[[jj]])[2]] else colors[names(tree$maps[[jj]])[1]]
lines(rep(obj$xx[i],2),obj$yy[c(i,daughters[j])],col=color,lwd=lwd)
}
}
}

Let's try it:

plot(tree,colors,lwd=3)
splitEdgeColor(tree,colors,lwd=3)

plot of chunk unnamed-chunk-4

We can also try it with paintBranches:

tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
b<-sample(tree$edge[,2],10)
c<-sample(setdiff(tree$edge[,2],b),10)
tree<-paintBranches(tree,b,"b","a")
tree<-paintBranches(tree,c,"c")
plot(tree,colors,lwd=3)
splitEdgeColor(tree,colors,lwd=3)

plot of chunk unnamed-chunk-5

Nice. It seems to work.

'Extracting' internal node values from an object of class "contMap"

$
0
0

A phytools user asked today if there is some way to 'extract' the reconstructed trait values from internal nodes in an object of class "contMap".

In fact, this is possible. (Although, as I'll explain below, the circumstances in which we want to do this are probably pretty limited.)

## load packages
library(phytools)
## simulate tree & data
x<-fastBM(tree<-pbtree(n=26,tip.label=LETTERS))
## build object of class "contMap"
obj<-contMap(tree,x,plot=FALSE)
obj
## Object of class "contMap" containing:
##
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
##
## (2) A mapped continuous trait on the range (-3.954532, 0.664747).
plot(obj)

plot of chunk unnamed-chunk-1

## now let's first get the indices in obj$cols corresponding
## to the *end* of each edge
ii<-c(as.numeric(names(obj$tree$maps[[1]])[1]),
sapply(obj$tree$maps,function(x) as.numeric(names(x)[length(x)])))
## find the trait values corresponding to each index
a<-setNames(ii/(length(obj$cols)-1)*diff(obj$lims)+obj$lims[1],
c(obj$tree$edge[1,1],obj$tree$edge[,2]))
a
##          27          28          29          30           1           2 
## -1.31692381 -1.54788773 -2.56412896 -2.25001804 -1.70494319 -1.81118659
## 31 32 3 4 5 33
## -2.85976278 -3.09072670 -3.68661360 -2.97062546 -3.94067391 -1.45088288
## 34 35 6 7 36 8
## -1.39083227 -0.47621515 -0.60093567 0.64165021 -2.03753123 -1.54326845
## 37 9 10 38 39 40
## -2.72580371 -2.73504226 -2.86900134 -1.76499381 -1.75113598 -1.49707567
## 41 11 42 12 13 14
## -1.27073103 -0.92890443 -0.98433577 -0.73951402 -0.95200082 -1.71418175
## 15 43 16 17 44 45
## -2.49022051 -1.82042515 -2.42555061 -2.03291196 -1.30768526 -1.21068041
## 46 18 47 19 20 48
## -0.62403206 -0.13438856 -0.53626577 -0.12976928 -0.92890443 -0.91966587
## 49 50 21 51 22 23
## -0.84113814 -0.98433577 -1.70956247 -0.61941278 -1.07672134 0.09657536
## 24 25 26
## 0.12891031 0.10119464 -3.00296041
## pull out only the node states
a<-a[as.character(1:tree$Nnode+Ntip(tree))]
a
##         27         28         29         30         31         32 
## -1.3169238 -1.5478877 -2.5641290 -2.2500180 -2.8597628 -3.0907267
## 33 34 35 36 37 38
## -1.4508829 -1.3908323 -0.4762152 -2.0375312 -2.7258037 -1.7649938
## 39 40 41 42 43 44
## -1.7511360 -1.4970757 -1.2707310 -0.9843358 -1.8204252 -1.3076853
## 45 46 47 48 49 50
## -1.2106804 -0.6240321 -0.5362658 -0.9196659 -0.8411381 -0.9843358
## 51
## -0.6194128

Now, the reason we don't really need to do this is because the node states are going to be the same as those estimated using functions such as fastAnc or anc.ML. (They will be slightly different just due to the fact that the color mapping is finely discretized across the edges & nodes of the tree.)

For instance:

plot(a,fastAnc(tree,x))

plot of chunk unnamed-chunk-2

More on splitting vertical line colors by the mapped states on daughter edges

$
0
0

I just postedan update to the phytools function splitEdgeColor which allows a user plotting an object of class "simmap" in which the mapped character changes state exactly at one more nodes, to split the color of the vertical edges in a square phylogram to split colors according to the differing states of the two daughter edges (if they differ).

This update tells R what to do in the case of multifurcating nodes. What I decided is that the line segment between each daughter edge should be the color of the daughter edge north of that edge for daughters north of the parent edge; and vice versa for daughters south of the parent edge.

To clarify what that means, we can try the following 'toy' example:

library(devtools)
install_github("liamrevell/phytools",quiet=TRUE)
library(phytools)
tree<-read.newick(text="((A:1.0,B:1.0,C:1.0,D:1.0):1.0,E:2.0);")
tree<-paintBranches(tree,1,"b","a")
tree<-paintBranches(tree,3,"c","a")
tree<-paintBranches(tree,5,"b")
colors<-setNames(c("black","blue","red"),c("a","b","c"))
plot(tree,colors,split.vertical=TRUE,lwd=3)

plot of chunk unnamed-chunk-2

Here are some other examples of how it looks:

set.seed(100)
tree<-rtree(n=26,tip.label=LETTERS)
tree$edge.length[(tree$edge.length<0.25)*(tree$edge[,2]>Ntip(tree))==1]<-0
di2multi(tree)->tree
b<-sample(tree$edge[,2],10)
c<-sample(setdiff(tree$edge[,2],b),10)
tree<-paintBranches(tree,b,"b","a")
tree<-paintBranches(tree,c,"c")
plot(tree,colors,split.vertical=TRUE)

plot of chunk unnamed-chunk-3

Or when we have used paintSubTree:

tree<-rtree(n=26,tip.label=LETTERS)
tree$edge.length[(tree$edge.length<0.25)*(tree$edge[,2]>Ntip(tree))==1]<-0
di2multi(tree)->tree
plotTree(tree,node.numbers=TRUE)

plot of chunk unnamed-chunk-4

tree<-paintSubTree(tree,33,"b","a",stem=TRUE)
tree<-paintSubTree(tree,42,"c","a",stem=TRUE)
tree<-paintSubTree(tree,41,"b",stem=TRUE)
plot(tree,colors,split.vertical=TRUE)

plot of chunk unnamed-chunk-4

That's it.

Print method for fastAnc

$
0
0

I just added a print method for the ancestral state estimation function, fastAnc. Here is a quick demo of how it works:

library(devtools)
install_github("liamrevell/phytools",quiet=TRUE)
library(phytools)
## Loading required package: ape
## Loading required package: maps

Simulate a tree & data:

x<-fastBM(tree<-rtree(n=12))
tree
## 
## Phylogenetic tree with 12 tips and 11 internal nodes.
##
## Tip labels:
## t5, t6, t8, t7, t3, t4, ...
##
## Rooted; includes branch lengths.
x
##          t5          t6          t8          t7          t3          t4 
## 0.08020791 0.90090545 -0.52120123 -0.70089226 0.60523627 0.32977174
## t10 t2 t12 t11 t1 t9
## -1.35329157 -0.41231935 -0.12192103 0.56163234 -1.40509098 -1.52915450
obj<-fastAnc(tree,x)
obj
## Ancestral character estimates using fastAnc:
## 13 14 15 16 17 18 19
## -0.613584 -0.333720 0.311369 -0.667771 -0.805572 -0.640828 0.133491
## 20 21 22 23
## -0.665823 -1.137190 -0.402258 0.504335
obj<-fastAnc(tree,x,CI=TRUE)
obj
## Ancestral character estimates using fastAnc:
## 13 14 15 16 17 18 19
## -0.613584 -0.333720 0.311369 -0.667771 -0.805572 -0.640828 0.133491
## 20 21 22 23
## -0.665823 -1.137190 -0.402258 0.504335
##
## Lower & upper 95% CIs:
## lower upper
## 13 -1.872864 0.645696
## 14 -1.649728 0.982288
## 15 -0.648504 1.271242
## 16 -1.773040 0.437499
## 17 -1.911716 0.300572
## 18 -1.048660 -0.232996
## 19 -0.697674 0.964656
## 20 -1.724715 0.393068
## 21 -1.890327 -0.384054
## 22 -1.274650 0.470134
## 23 0.242778 0.765892
print(obj,printlen=6)
## Ancestral character estimates using fastAnc:
## 13 14 15 16 17 18
## -0.613584 -0.33372 0.311369 -0.667771 -0.805572 -0.640828 ....
##
## Lower & upper 95% CIs:
## lower upper
## 13 -1.872864 0.645696
## 14 -1.649728 0.982288
## 15 -0.648504 1.271242
## 16 -1.77304 0.437499
## 17 -1.911716 0.300572
## 18 -1.04866 -0.232996
## .... ....

That's it.

Viewing all 802 articles
Browse latest View live