New di2multi function for stochastically mapped trees

December 18, 2013, 11:43 am

≫ Next: More functions & fixes in Rphylip

≪ Previous: Three different ways to calculate the among-species variance-covariance matrix for multiple traits on a phylogeny

I just posted some code the collapses internal branches of zero length (or, more specifically, branches with length shorter than some arbitrarily specified value tol) to polytomies for trees with mapped discrete characters created using (for instance) make.simmap or read.simmap. This is exactly the same as di2multi in the ape package (functionally - I programmed it totally differently, hopefully not at the peril of users); however it works for modified "phylo" objects with a mapped discrete trait.

The code to the function is here, but it uses internal phytools functions - so for it to work, you will probably have to install the latest version of phytools (here).

Here's a quick demo of the function at work:

> require(phytools)
> packageVersion("phytools")
[1] ‘0.3.81’
> ## first create a tree with some polytomies
> N<-26
> tree<-pbtree(n=N,tip.label=LETTERS[26:1],scale=1)
> tree$edge.length[tree$edge.length<0.1&tree$edge[,2]>N]<-0
> ## make the tree ultrametric
> d<-max(vcv(tree))-diag(vcv(tree))
> tree$edge.length[tree$edge[,2]<=N]<- tree$edge.length[tree$edge[,2]<=N]+d
> ## check is ultrametric
> is.ultrametric(tree)
[1] TRUE
> ## check is binary
> is.binary.tree(tree
[1] TRUE
> ## simulate discrete character history on tree
> Q<-2*matrix(c(-1,1,1,-1),2,2)
> tree<-sim.history(tree,Q)
> plotSimmap(tree,colors=setNames(c("blue","red"),1:2), lwd=3)

> ## collapse branches of zero length
> tree<-di2multi.simmap(tree)
> ## check is binary
> is.binary.tree(tree)
[1] FALSE
> plotSimmap(tree,colors=setNames(c("blue","red"),1:2), lwd=3)

The main reason this might be important is because densityMap runs into problems when your tree has internal edges that are very short. This can be addressed by first collapsing all zero/small branches in each of the stochastic mapped trees, and then computing the density-map on the collapses stochastic mapped trees. Code for that would look like the following:

trees<-lapply(trees,di2multi.simmap)
class(trees)<-"multiPhylo"
densityMap(trees)

↧

More functions & fixes in Rphylip

December 21, 2013, 9:17 pm

≫ Next: S3 print method for brownie.lite

≪ Previous: New di2multi function for stochastically mapped trees

My latest updates to Rphylip, an R interface for Felsenstein's phylogeny methods package PHYLIP, includes Rdnainvar, an interface for dnainvar, a program for Lake's invariants from DNA sequences; and Rfitch, an interface for fitch, a program that does least-squares and minimum evolution phylogeny inference.

In addition to this, though, today I added the functions setPath&clearPath. The former can be used to set the path to the PHYLIP executables folder for the current R session. Normally, Rphylip tries to find the path to the PHYLIP executable using the internal Rphylip function, findPath. However this function only works if the PHYLIP folder is in a sensible location, such as in C:/Program Files/ (for Windows) or /Applications/ (for Mac OS). If not, the path can always be supplied to the function directly. What setPath does is enable the user to set a different place to look for PHYLIP executables for the current R session. It does this by setting the environmental variable "phylip.path". clearPath does the opposite. It removes the environmental variable "phylip.path" thus permitting Rphylip functions to search for PHYLIP executables in the regular places.

Here's a demo in which I have PHYLIP installed in C:/Users/Liam/Documents/Phylogeny_Programs/:

> require(Rphylip)
Loading required package: Rphylip
Loading required package: ape
> packageVersion("Rphylip")
[1] ‘0.1.13’
> data(primates)
> D<-Rdnadist(primates)
Error in Rdnadist(primates) :
No path provided and was not able to find path to dnadist
> setPath("C:/Users/Liam/Documents/Phylogeny_Programs/phylip-3.695/exe/")
> D<-Rdnadist(primates)

...

Nucleic acid sequence Distance Matrix program, version 3.695

...

> tree<-Rfitch(D)

...

Fitch-Margoliash method version 3.695

...

12 Populations

Fitch-Margoliash method version 3.695

__ __    2
\ \   (Obs - Exp)
Sum of squares = /_ /_ ------------
2
   i j Obs

Negative branch lengths not allowed

global optimization

   +-----------7
+--------2
! !   +-------6
! +---1
! ! +-4
   +--------3 +--8
   ! !    +----5
   ! !
   ! ! +-----------------8
   ! +------9
   !    !   +-------------9
+--------------------7    +---5
! !    ! +-------10
! !    +----6
! ! ! +------12
! ! +-4
! ! +-------11
! !
! +----------------------------3
!
10--------------------------------------2
!
+-------------------------------------1

remember: this is an unrooted tree!

Sum of squares =    0.68048

Average percent standard deviation =    7.23497

...

Translation table
-----------------
1    Lemur
2    Tarsier
3    Sq.Monkey
4    J.Macaque
5    R.Macaque
6    E.Macaque
7    B.Macaque
8    Gibbon
9    Orangutan
10 Gorilla
11 Chimp
12 Human

> plot(tree)

↧

S3 print method for brownie.lite

December 24, 2013, 6:36 pm

≫ Next: Three years of blogging

≪ Previous: More functions & fixes in Rphylip

Today I wrote an S3 print method for objects returned by the phytools function brownie.lite. This function implements the method of O'Meara et al. (2006) in which we fit different rates of evolution to different parts of a phylogeny (sometimes, say, determined by a mapped discrete character). Print methods are nice because they allow us to tell R how to summarize the content of a complicated object, like a phylogeny or the results of numerical optimization.

The object returned by brownie.lite is unchanged, except that it is now assigned the class attribute "brownie.lite". Here's how it works:

> require(phytools)
> packageVersion("phytools")
[1] ‘0.3.83’
> ## first let's simulate some data
> Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-letters[1:2]
> tree<-sim.history(pbtree(n=100,scale=1),Q,anc="a")
> ## simulate with a rate difference among states
> x<-sim.rates(tree,setNames(c(1,2),letters[1:2]))
> ## simulate without a rate difference
> y<-fastBM(tree)
> plotSimmap(tree,setNames(c("blue","red"),letters[1:2]), lwd=3,ftype="off")

> fitx<-brownie.lite(tree,x)
> fitx # this is the same as print(fitx)
ML single-rate model:
s^2    se a    k logL
parameter 1.5484 0.2191 -0.4682 2 -73.73

ML multi-rate model:
s^2(a) se(a)   s^2(b) se(b)   a    k logL
parameter 0.732   0.2243 1.8163 0.2955 -0.499 3 -70.93

P-value (based on X^2): 0.0179

R thinks it has found the ML solution.

> fity<-brownie.lite(tree,y)
> fity
ML single-rate model:
s^2    se a    k logL
parameter 1.0646 0.1507 -0.7218 2 -55.00

ML multi-rate model:
s^2(a) se(a)   s^2(b) se(b)   a    k logL
parameter 0.8925 0.2661 1.1209 0.1841 -0.7271 3 -54.79

P-value (based on X^2): 0.5184

R thinks it has found the ML solution.

This version of phytools can be downloaded here and installed from source.

↧

Three years of blogging

December 26, 2013, 11:32 am

≫ Next: New method to locate one or multiple rate shifts on a tree using likelihood

≪ Previous: S3 print method for brownie.lite

It's now been three years since I starting blogging about phylogeny methods on blog.phytools.org (originally phytools.blogspot.com) - and so, in the tradition of 2011 and 2012, I thought I'd spend a few minutes talking about what I did this year in the phytools package and on the blog.

According to the (somewhat dubious, in my opinion) blogger.com page stats, the phytools blog received upwards of 150,000 page views in 2013. Even if 1/2 of these were by bots, that is still quite an impressive tally. Certainly, by any measure the popularity of the phytools blog as a free repository of information about phytools and phylogeny methods has increased over the past year.

Towards the end of 2012 and throughout 2013 I added considerably to the plotting capabilities of phytools (evidenced, in part, by my recent MEE paper on some new plotting methods). Consequently, it's no great surprise that two of the most viewed phytools blog posts of 2013 included the description of a new method to visualize uncertainty on a traitgram, and some of the description and illustration of the new plotting functions contMap and densityMap, including this description of a published use of both methods in the same figure (which probably didn't hurt by having been re-tweeted by @systbiol). Nestled also among the top three most popular blog posts of 2013 was this comment on why we don't normally expect the residuals from phylogenetic ANOVA or regression to be normally distributed. Also popular were a wide range of posts about stochastic mapping and ancestral state reconstruction, including information about a new method based on the the threshold model from evolutionary quantitative genetics.

Towards the end of 2013, I started on a new project Rhylip. The purpose of Rphylip is to create an R interface for all 30+ programs in the PHYLIP phylogeny method software package by Joe Felsenstein. This will hopefully allow the many functions of PHYLIP to be used seamlessly within an integrated R workflow. (Here's an example of Rphylip at work - created for my phylogeny methods class using knitr.) I'm about 50% of the way there, so I hope to get this done soon.

Finally, I recently learned that my CAREER proposal to do phylogeny method research in several new areas has been recommended for funding by the NSF DEB Systematic Biology program. (Prospective students & postdocs take notice - I will be hiring in 2014!) This comes exactly at the right time as my start-up is in its dying breaths right now. Interestingly, part of the original ulterior motive in developing this blog back towards the end of 2010 was as a supporting 'broader impact' for what was at that time my first attempt to acquire funding for phylogeny method research from the NSF. Thus, the NSF is in part responsible for this blog & phytools as a community resource even without having funded it! (Until now, that is.)

Happy 2014 & thanks for reading!

↧

New method to locate one or multiple rate shifts on a tree using likelihood

December 27, 2013, 9:57 pm

≫ Next: More updates to rateshift method: Testing for the presence of a rate shift

≪ Previous: Three years of blogging

I just posted a new phytools function, rateshift, that fits a model in which there are one or multiple Brownian rate shifts in the tree at different heights above the root. The idea is that we don't need to specify the locations of the rate shifts a priori (as we can already do using brownie.lite); rather, we let the data determine where the rate shifts are located. Turns out that this isn't too hard, it just requires p-1 additional parameters for each extra rate above the root.

It also occurred to me that it's entirely possible this method is already in the literature - in which case, please accept my apologies for having missed it!

In the simplest case, this would just be a model with two evolutionary rates: σ²(1) rootward of the shift, and σ²(2) tipward; and one rate shift. We then jointly maximize the likelihood of the rates & position of the rate shift.

Here's a quick demo:

> require(phytools)
Loading required package: phytools
Loading required package: ape
Loading required package: maps
Loading required package: rgl
> packageVersion("phytools")
[1] ‘0.3.85’
> tree<-pbtree(n=100,scale=1)
> tree<-make.era.map(tree,c(0,0.5,0.8))
> x<-sim.rates(tree,c(1,10,1))
names absent from sig2: assuming same order as $mapped.edge
> fit<-rateshift(tree,x,nrates=3,print=TRUE,plot=TRUE, tol=1e-5)
Optimization progress:

s^2(1) s^2(2) s^2(3) shift:1 shift:2 logL
2.623 2.623 2.623 0.3333 0.6667 -117.6615
2.624 2.623 2.623 0.3333 0.6667 -117.6616
....

2.1182 7.6728 0.7129 0.5497 0.8235 -100.7116
2.1182 7.6728 0.7109 0.5497 0.8235 -100.7163
2.1182 7.6728 0.7119 0.5507 0.8235 -100.7079
....
1.1796 8.6499 0.7848 0.5886 0.8116 -100.1251
1.1796 8.6499 0.7848 0.5896 0.8126 -100.1175
1.1796 8.6499 0.7848 0.5896 0.8106 -100.1275

> fit
ML 3-rate model:
s^2(1) se(1) s^2(2) se(2) s^2(3) se(3) k logL
value 1.1796 0.991 8.6499 2.3647 0.7848 0.1709 6 -100.12

Shift point(s) between regimes (height above root):
1|2 se(1|2) 2|3 se(2|3)
value 0.5896 0.0173 0.8126 0.0173

R thinks it has found the ML solution.

The options plot&print slow down runtime; however I included them for the purposes of debugging and it is kind of neat to visualize the optimization of the locations of the rate shift.

The implementation is a little buggy - however at least in this example it seems to do pretty well at finding our generating rate shift points (which were, remember, 0.5 & 0.8 units above the root), and rates (1, 10, & 1). Cool!

This function is in a new phytools build (phytools 0.3-85), which can be downloaded & installed from source. Please check it out.

↧

More updates to rateshift method: Testing for the presence of a rate shift

December 29, 2013, 11:31 am

≫ Next: Three more functions & some more methods in Rphylip

≪ Previous: New method to locate one or multiple rate shifts on a tree using likelihood

I have made some updates to the function rateshift (first described here) to facilitate comparison of alternative models for rate shifts; as well as for testing the null hypothesis of no shift.

First, I fixed the function so it could fit a no-rate-shift model. That was broken in the previous version, but should work now. When we fit the nrates=1 model, the fitted model parameter value (σ²) and log-likelihood should be the same as from (say) fitContinuous in geiger or the one-rate model in brownie.lite.

Second, I created an S3 generic logLik method for the object of class "rateshift" returned by the function. This allows us to easily extract the log-likelihood & model parameterization; but it also allows us to use the generic AIC to compute the Akaike Information Criterion value for the fitted model.

Finally, third, I fixed a minor bug which sometimes created an incompatibility in the tolerance (basically, the very small values we need to add or subtract from some quantities to make sure that the function does not attempt to evaluate the likelihood where it isn't defined) are inconsistent between rateshift and make.era.map, which is used internally. This required changes to both functions, so the wisest thing to do to get this update is to update phytools to the latest non-CRAN version.

OK. Here's a demo:

> require(phytools)
Loading required package: phytools
Loading required package: ape
Loading required package: maps
Loading required package: rgl
> packageVersion("phytools")
[1] ‘0.3.86’
> ## simulate tree & data
> tree<-pbtree(n=100,scale=1)
> tree<-make.era.map(tree,c(0,0.5,0.8))
> x<-sim.rates(tree,c(1,10,1),internal=TRUE)
names absent from sig2: assuming same order as $mapped.edge
> ## here's a visual of our simulation
> phenogram(tree,x,ftype="off")

> ## peel off ancestral states
> x<-x[tree$tip.label]
>
> ## fit 1 rate model
> fit1<-rateshift(tree,x,nrates=1)
> fit1
ML 1-rate model:
s^2(1) se(1) k logL
value 1.7966 0.2542 2 -81.8257

This is a one-rate model.

R thinks it has found the ML solution.

> ## fit 2 rate model
> fit2<-rateshift(tree,x,nrates=2)
> fit2
ML 2-rate model:
s^2(1) se(1) s^2(2) se(2) k logL
value 6.6364 2.1013 0.785 0.1345 4 -64.3827

Shift point(s) between regimes (height above root):
1|2 se(1|2)
value 0.806 0.02

R thinks it has found the ML solution.

> ## test 2 rates vs 1 rate
> P2vs1<-as.numeric(pchisq(2*(logLik(fit2)-logLik(fit1)), df=attr(logLik(fit2),"df")-attr(logLik(fit1),"df"), lower.tail=FALSE)) > P2vs1
[1] 2.658128e-08
> ## fit 3 rate model
> fit3<-rateshift(tree,x,nrates=3)
> fit3
ML 3-rate model:
s^2(1) se(1) s^2(2) se(2) s^2(3) se(3) k logL
value 1.7181 NaN 6.2815 1.8016 0.7716 0.1345 6 -64.017

Shift point(s) between regimes (height above root):
1|2 se(1|2) 2|3 se(2|3)
value 0.3058 0.0265 0.8193 0.0141

R thinks it has found the ML solution.

> ## test 3 rates vs 2 rates
> P3vs2<-as.numeric(pchisq(2*(logLik(fit3)-logLik(fit2)),
df=attr(logLik(fit3),"df")-attr(logLik(fit2),"df"),
lower.tail=FALSE))
> P3vs2
[1] 0.6934852

This shows us that although the fitted shift points in our third fitted model are fairly close to the gnerating shift points, the fit isn't significantly better than our two rate model. I suspect that, in general, it will probably be easier to find shift points that are closer to the tips of the tree, where there tends to be more edges.

Cool.

↧

Three more functions & some more methods in Rphylip

December 30, 2013, 12:35 pm

≫ Next: Bug fix in phylANOVA

≪ Previous: More updates to rateshift method: Testing for the presence of a rate shift

I just added a few more functions to the Rphylip project, my R interface for the PHYLIP package. The new interface functions are Rpars (for PARS), Rmix (for MIX), and Rpenny (for PENNY). All three of these are parsimony method programs: the first does heuristic MP search from unordered, multistate data; whereas the latter do (Wagner, Camin-Sokal, or mixed method) MP searching using heuristic or branch-and-bound algorithms, respectively. More details on the programs can be found by referring to the PHYLIP documentation pages linked above.

I also created a new class of data object, "phylip.data", which just generalizes "proseq" (in Rphylip) and "DNAbin" (in ape), and is very simple.

Here's a quick demo using Rpenny. Note that branch-and-bound should generally not be used for more than a dozen or so taxa (it will become computationally prohibitive quickly).

> require(Rphylip)
Loading required package: Rphylip
Loading required package: ape
> packageVersion("Rphylip")
[1] ‘0.1.14’
> data(primates.bin)
> primates.bin
12 character value sequences stored in a matrix.

All sequences of same length: 231

Labels: Lemur Tarsier Sq.Monkey J.Macaque R.Macaque E.Macaque ...

Trait value composition:
0    1
0.406 0.594
> tree<-Rpenny(primates.bin)

....

How many
trees looked    Approximate
at so far Length of How many    percentage
(multiples   shortest tree trees this long searched
of 100): found so far   found so far    so far
----------   ------------   ------------    ------------
   1    - 0 0.00
   2    - 0 0.00
   3    - 0 0.00
   4    208.00000 3 0.00
   5    208.00000 6 0.00
   6    208.00000 6 0.14
   7    208.00000 6 1.90
   8    208.00000 6 6.67
   9    208.00000 6 9.33
10    208.00000 6    14.00
11    208.00000 6    37.78
12    208.00000 6    53.33

Output written to file "outfile"

Trees also written onto file "outtree"

Press enter to quit.

Penny algorithm, version 3.695
branch-and-bound to find all most parsimonious trees

Wagner parsimony method

requires a total of 208.000

6 trees in all found

+--------------------------------1
!
! +-----------------------------2
! !
--1 !    +-----10
! ! +-10
! ! ! ! +--11
! !    +--6 +--8
! !    ! !    +--12
+--2    +-----------5 !
   !    !    ! +--------9
   !    !    !
   !    !    +-----------8
   ! +--4
   ! ! !    +-----6
   ! ! ! +-11
   ! ! ! ! ! +--5
   +--3 +--------------7 +--9
!    !    +--7
!    !
!    +--------4
!
+--------------------------3

remember: this is an unrooted tree!

....

Translation table
-----------------
1    Lemur
2    Tarsier
3    Sq.Monkey
4    J.Macaque
5    R.Macaque
6    E.Macaque
7    B.Macaque
8    Gibbon
9    Orangutan
10 Gorilla
11 Chimp
12 Human

Rooted tree(s) with the outgroup
------------------------
Tarsier, Lemur

> require(phytools)
Loading required package: phytools
Loading required package: maps
Loading required package: rgl
> par(mfrow=c(3,2))
> plotTree(tree)
Waiting to confirm page change...

(These are the six equally most parsimonious trees found by PENNY.)

Cool. The latest version of Rphylip can be downloaded here, and is also on GitHub.

↧

Bug fix in phylANOVA

January 25, 2014, 7:28 pm

≫ Next: New version of ancThresh for λ model

≪ Previous: Three more functions & some more methods in Rphylip

A phytools user recently reported discrepant results between phylANOVA in phytools and aov.phylo in geiger. Both functions conduct the simulation-based method of Garland et al. (2013). In theory, the only difference is that phylANOVA performs post-hoc comparison of means. It turns out, however, that phylANOVA contains the implicit assumption that y is in the order of tree$tip.label. This assumption is now only true if names(y) is NULL, in which case a warning is also issued. Updated code is here and in the latest version of phytools.

↧

New version of ancThresh for λ model

January 29, 2014, 9:19 pm

≫ Next: Function for midpoint rooting

≪ Previous: Bug fix in phylANOVA

The current version of ancThresh (for ancestral character estimation under the threshold model; see Revell In press) permits a Brownian or Ornstein-Uhlenbeck model for the evolution of the liabilities. I just added a 3rd model, the λ model of Pagel (1999). The code for this version is here; but since it uses some functions internally that have also been updated, the best thing to do is to update phytools to the latest version.

I'm not a huge fan of the λ model in general, since it is not clear what biological process it is meant to approximate; however under some circumstances it could be a useful model for traits that evolve on the tree (and thus have phylogenetic covariance), but are also affected by contemporary factors that are not necessarily phylogenetically correlated. This is how I am using this model in the empirical study for which I have added this feature to ancThresh.

At the moment it is largely untested - so if you run into any problems, please let me know.

↧

Function for midpoint rooting

January 30, 2014, 12:48 pm

≫ Next: New version of threshDIC for comparing OU &λ models

≪ Previous: New version of ancThresh for λ model

Today Todd Oakley asked:

"Does anyone know of an existing midpoint rooting routine? I am displaying trees and would like to show them as midpoint rooted. I've been using the phangorn package, which does the midpoint rooting perfectly, but it has some dependencies that make it unstable on the linux machines I've been using. When I looked last, phangorn was the only package I could find with midpoint rooting."

Well, midpoint rooting is not theoretically very difficult. All one needs to do is find the longest path between any pair of tips & then locate the root midway along that path. I just posted code for this here (as well as a new phytools build, which can be downloaded & installed from source).

Here is a brief description of some of the tricks that I used:

(1) I used cophenetic.phylo from ape to get all the distances between tips, choose the longest one, and then identify the two species on either end of that path.

(2) I used reroot in phytools to re-root the tree immediately below one of these two tips. I did this so that I could use a new custom function getAncestors (which does the same thing as Ancestors in phangorn, but using no phangorn code) to find the set of all internal nodes ancestral to the other tip subtending the longest path. The new position of the root will be between two of these nodes.

(3) I used dist.nodes in ape to compute the distances between the tip of interest and all the internal nodes I had found in (2).

(4) Finally, I found the two nodes subtending the new root position & re-rooted the tree (using reroot from phytools) in the correct position between those nodes.

That's it. The function has not yet been thoroughly tested, so please give me feedback if it doesn't work as intended. Here's a quick demo:

> library(phytools)
Loading required package: ape
Loading required package: maps
Loading required package: rgl
> packageVersion("phytools")
[1] ‘0.3.89’
> tree<-rtree(n=12)
> mpt1<-midpoint.root(tree)
> plotTree(mpt1)

> require(phangorn)
Loading required package: phangorn
> mpt2<-midpoint(tree)
> plotTree(mpt2)

These trees may not look exactly the same, but they are (just with different rotations of internal nodes):

> all.equal.phylo(mpt1,mpt2)
[1] TRUE

I have not tested this function against phangorn's midpoint, but if history is any indication, midpoint is probably faster & more elegantly programmed. Hopefully for Todd's purposes, this will work.

↧

New version of threshDIC for comparing OU &λ models

February 1, 2014, 6:00 pm

≫ Next: Small fix in evol.vcv and new S3 print method

≪ Previous: Function for midpoint rooting

I just posted a new version of the function threshDIC that allows users to use the deviance information criterion (DIC) to compare Brownian, OU, and λ models for evolution of the liability on the tree in ancThresh. I implemented threshDIC primarily to compare alternative sequences for the threshold characters on the liability axis - but (theoretically) the same approach could be useful in choosing among alternative models for the liability.

DIC is similar to the better known AIC, except that it can be used for Bayesian approaches in which we are unable to maximize the likelihood and all we have is a sample from the posterior distribution obtained by MCMC. DIC estimates the effective parameterization of our model by computing the difference between the mean likelihoods evaluated for each of our samples from the posterior and the likelihood for our mean parameter values. The logic is that this difference will be increase with the effective number of parameters in the model (because when the number of parameters is large we tend to spend most of the time during MCMC far from the parameter values that maximize the likelihood).

This code is also in a new version of phytools here. (In fact, should you try to run it from source without updating phytools it will not work because it has an internal dependency only present in the last couple of non-CRAN versions.)

↧

Small fix in evol.vcv and new S3 print method

February 11, 2014, 8:55 pm

≫ Next: New Rphylip function: Rdollop

≪ Previous: New version of threshDIC for comparing OU &λ models

A phytools user pointed out that there was a peculiar 'bug' in the phytools function evol.vcv due to an inconsistency between the variable names in the function definition for an internally used function and the names of the variables used inside this function. It is peculiar because (by a stroke of luck) the internally used variable name is already defined correctly within the main function which means that this bug can never cause a problem. I have fixed it anyway in a new function version and phytools build (here).

Since I already had the source code for this function open, I decided to add an S3 print method for objects returned by evol.vcv. This is in the style of what I'd already done for brownie.lite. This seems to work pretty well. Here's a quick demo.

> require(phytools)
Loading required package: phytools
Loading required package: ape
Loading required package: maps
Loading required package: rgl
> packageVersion("phytools")
[1] ‘0.3.91’
> tree<-pbtree(n=26,tip.label=LETTERS[26:1])
> Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-c("A","B")
> tree<-sim.history(tree,Q,anc="A")
> plotSimmap(tree,lwd=3)
no colors provided. using the following legend:
A B
"black" "red"

> V<-list(matrix(c(1,0,0,1),2,2),matrix(c(1,1.2,1.2,2),2,2))
> names(V)<-c("A","B")
> V
$A
   [,1] [,2]
[1,] 1 0
[2,] 0 1

$B
   [,1] [,2]
[1,] 1.0 1.2
[2,] 1.2 2.0

> X<-sim.corrs(tree,vcv=V)
> fit<-evol.vcv(tree,X)
> fit
ML single-matrix model:
R[1,1] R[1,2] R[2,2] k    log(L)
fitted 0.675   0.355   1.5944 5    -68.1635

ML multi-matrix model:
R[1,1] R[1,2] R[2,2] k    log(L)
A    0.7789 -0.3202 0.4378 8    -64.6364
B    0.6622 1.0262 2.7149

P-value (based on X^2): 0.0702

R thinks it has found the ML solution.

This is pretty much what we were going for. It might get a little messy if we have a lot of columns in X. That's it.

↧

New Rphylip function: Rdollop

February 16, 2014, 2:01 pm

≫ Next: New Rphylip functions: Rgendist & Rdolpenny

≪ Previous: Small fix in evol.vcv and new S3 print method

After a hiatus to work on other stuff (including my winter-term tropical ecology course and various updates to phytools), I have finally returned to working on my R interface for Felsenstein's PHYLIP software package, Rphylip. The latest update is a new function, Rdollop which wraps around PHYLIP program DOLLOP, a program for Dollo & polymorphism parsimony tree inference. For more information about DOLLOP, refer to its documentation page.

Here's a quick demo using a binary primate dataset packaged with Rphylip. Note that this is not a dataset that I suspect evolved under Dollo's law, thus the example is simply demonstrative in nature:

> require(Rphylip)
Loading required package: Rphylip
Loading required package: ape
> data(primates.bin)
> tree<-Rdollop(primates.bin)

Dollo and polymorphism parsimony algorithm, version 3.695

...

Adding species:
   1. 7
   2. 12
   3. 6
   4. 1
   5. 3
   6. 4
   7. 10
   8. 5
   9. 2
10. 8
11. 11
12. 9

Doing global rearrangements
!-----------------------!
   .......................
   .......................

...

Dollo and polymorphism parsimony algorithm, version 3.695

Dollo parsimony method

   7 trees in all found

+--------------------------------2
!
! +-----------9
! !
! +-----------8    +-----12
! !    ! +-11
! !    ! ! ! +--11
! !    +--9 +-10
! ! !    +--10
--1 ! !
!    +--7 +--------8
!    ! !
!    ! !    +-----6
!    ! ! +--5
!    ! ! ! ! +--7
! +--3 +--------------4 +--6
! ! !    !    +--5
! ! !    !
+--2 !    +--------4
   ! !
   ! +--------------------------3
   !
   +-----------------------------1

requires a total of 154.000

...

Translation table
-----------------
1    Lemur
2    Tarsier
3    Sq.Monkey
4    J.Macaque
5    R.Macaque
6    E.Macaque
7    B.Macaque
8    Gibbon
9    Orangutan
10 Gorilla
11 Chimp
12 Human

> tree
7 phylogenetic trees
> plot(tree,no.margin=TRUE)
Waiting to confirm page change...
...

... and obviously 6 more trees as well, in this case.

That's it. A Rphylip build containing this function can be downloaded from GitHub.

↧

New Rphylip functions: Rgendist & Rdolpenny

February 18, 2014, 2:48 pm

≫ Next: New faster version of ltt (& ltt95) for ultrametric trees

≪ Previous: New Rphylip function: Rdollop

Rphylip, an R interface to the PHYLIP software package by Joe Felsenstein (1989, 2013), reached an entirely meaningless milestone today: it now contains R interfaces for exactly* 2/3 of the programs in the PHYLIP package. (*By some accounting. I have included the program THRESHML, which is not yet part of PHYLIP; and there is also a couple of programs that have been counted for which we will probably not write interfaces.) That means that Rphylip now has 24 different interface functions, as well as a number of other helper functions (including some that add new functionality, such as opt.Rdnaml.) The two latest additions to the family are Rdolpenny (an interface for DOLPENNY) and Rgendist (an interface for GENDIST).

Here's a quick demo of the latter, Rgendist, which is for the calculation of genetic distances from gene frequency data. The data used here (in X) are from the test data in the GENDIST documentation:

> X
   locus 1 locus 2 locus 3 locus 4 locus 5 locus 6
European 0.2868 0.5684 0.4422 0.4286 0.3828 0.7285
African    0.1356 0.4840 0.0602 0.0397 0.5977 0.9675
Chinese    0.1628 0.5958 0.7298 1.0000 0.3811 0.7986
American 0.0144 0.6990 0.3280 0.7421 0.6606 0.8603
Australian 0.1211 0.2274 0.5821 1.0000 0.2018 0.9000
   locus 7 locus 8 locus 9 locus 10
European 0.6386 0.0205 0.8055   0.5043
African    0.9511 0.0600 0.7582   0.6207
Chinese    0.7782 0.0726 0.7482   0.7334
American 0.7924 0.0000 0.8086   0.8636
Australian 0.9837 0.0396 0.9097   0.2976
> Dnei<-Rgendist(X)

....

Genetic Distance Matrix program, version 3.695

....

Distances calculated for species
1
2 .
3 ..
4 ...
5 ....

Distances written to file "outfile"

Done.

> Dnei
   European African Chinese American
African 0.078002
Chinese 0.080749 0.234698
American   0.066805 0.104975 0.053879
Australian 0.103014 0.227281 0.063275 0.134756

> # Cavalli-Sforza (1967) distances
> Dcavalli<-Rgendist(X,method="Cavalli-Sforza")

....

Genetic Distance Matrix program, version 3.695

....

Distances calculated for species
1
2 .
3 ..
4 ...
5 ....

Distances written to file "outfile"

Done.

> Dcavalli
   European African Chinese American
African 0.181749
Chinese 0.181987 0.480537
American   0.129497 0.231519 0.147522
Australian 0.260814 0.480491 0.123618 0.283144

An input matrix, as shown above, is only one of two ways that the data can be sent to Rgendist. The user can also supply a list of matrices in which each matrix contains the frequencies of each allele at a single locus (and thus the length of the list is equal to the number of loci in the analysis). See the function documentation for more information.

All the latest work on Rphylip (including package builds) can be obtained from GitHub.

That's all for now.

↧

New faster version of ltt (& ltt95) for ultrametric trees

February 25, 2014, 9:22 pm

≫ Next: Rphylip mostly done....

≪ Previous: New Rphylip functions: Rgendist & Rdolpenny

Some phytools users reported problems with the phytools function ltt95 (for plotting a 95% high probability LTT from a posterior sample of trees). This is likely due to the use of ltt internally, which is very slow. The reason it is slow is because it does not assume that the tree is ultrametric (& is probably unnecessarily slow even so, but that's a problem for another day). If we first check if the tree is ultrametric we can call branching.times internally which is fast, and everything else is sped along considerably.

I have now done that. The updated code is here& it is also part of a new phytools build (here) which can be downloaded & installed from source. When the trees in the posterior sample are ultrametric, the speed-up that results is really extraordinary (about 10× for a tree with 100 taxa).

↧

Rphylip mostly done....

February 26, 2014, 8:00 am

≫ Next: New version of rateshift; new version of phytools submitted to CRAN

≪ Previous: New faster version of ltt (& ltt95) for ultrametric trees

I've been continuing to plug away at Rphylip, our new R interface for Joe Felsenstein's phylogeny methods package PHYLIP. More information about this effort can be found here: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. The latest additions are Rkitsch, an interface for KITSCH (a least-squares & ME inference program with a clock constraint); Rrestdist, an interface for RESTDIST (a program for distance calculation from restriction site or fragment data); Rrestml, an interface for RESTML (a restriction site ML tree inference program); and Rclique, an interface for CLIQUE (phylogeny inference from binary characters by the compatibility method). I also added a new object class, "rest.data", for restriction data, as well as some methods to convert to the class & print.

Rphylip is on GitHub, and the latest build can be downloaded& installed from source.

Here's a quick demo of Rrestml using the demo dataset from the RESTML documentation in PHYLIP:

> require(Rphylip)
Loading required package: Rphylip
Loading required package: ape
> packageVersion("Rphylip")
[1] ‘0.1.20’
> data(restriction.data)
> restriction.data
13 restriction site scores for 5 species stored in a object of class "rest.data".

All sequences of same length: 13

Number of restriction enzymes used to generate the data: 2

Labels: Alpha Beta Gamma Delta Epsilon

> mltree<-Rrestml(restriction.data,quiet=TRUE)

Restriction site Maximum Likelihood method, version 3.695

Recognition sequences all 6 bases long

Sites absent from all species are assumed to have been omitted

   +2
+--1
| | +4
| +--2
|    +5
|
3----3
|
+1

remember: this is an unrooted tree!

Ln Likelihood =   -40.31850

Between And Length   Approx. Confidence Limits
------- --- ------   ------- ---------- ------
   3 1 0.01396 (    zero,    0.04907)
   1    2    0.00064 (    zero, infinity)
   1 2 0.05872 (    zero,    0.12666) **
   2    4    0.01447 (    zero,    0.04458) **
   2    5    0.00100 (    zero, infinity)
   3    3    0.10801 ( 0.01151,    0.21877) **
   3    1    0.01046 (    zero,    0.04404)

   * = significantly positive, P < 0.05
   ** = significantly positive, P < 0.01

> plot(mltree,no.margin=TRUE,type="unrooted",edge.width=2)

OK. That's all for now. We're still working on documentation & examples, but hopefully we will have a version on CRAN before too long.

↧

New version of rateshift; new version of phytools submitted to CRAN

March 3, 2014, 8:27 am

≫ Next: Phylogenetic comparative methods mini-course at Universidad de los Andes, Bogotá

≪ Previous: Rphylip mostly done....

Some comments on earlier version of the function rateshift for identifying one or multiple shifts in the Brownian rate of evolution on the tree suggested that there were some difficulties in converging to the ML solution. Indeed, this is not too surprising. I have now posted a new version of rateshift that has more robust optimization. Here's a quick demo:

> require(phytools)
Loading required package: phytools
Loading required package: ape
Loading required package: maps
Loading required package: rgl
> packageVersion("phytools")
[1] ‘0.3.93’
> ## simulate tree & data
> tree<-pbtree(n=100,scale=1)
> tree<-make.era.map(tree,c(0,.5))
> x<-sim.rates(tree,c(10,1),internal=TRUE)
names absent from sig2: assuming same order as $mapped.edge
> ## here's a visual of our simulation
> phenogram(tree,x,ftype="off")

> ## peel off ancestral states
> x<-x[tree$tip.label]
> ## fit one-rate model:
> fit1<-rateshift(tree,x,nrates=1)
Optimization progress:
|..........|
Done.

> ## fit two-rate model:
> fit2<-rateshift(tree,x,nrates=2)
Optimization progress:
|..........|
Done.

> fit1
ML 1-rate model:
s^2(1) se(1)   k    logL
value   1.5419 0.2179 2    -89.379
This is a one-rate model.

R thinks it has found the ML solution.

> fit2
ML 2-rate model:
s^2(1) se(1)   s^2(2) se(2)   k    logL
value   7.3925 4.0858 1.2412 0.1903 4    -85.911

Shift point(s) between regimes (height above root):
1|2    se(1|2)
value   0.5274 0.01

R thinks it has found the ML solution.

In addition, I was recently informed that the package extrafonts has been removed from CRAN. phytools depends on extrafonts for the plotting functions xkcdTree and fancyTree. To address this dependency issue I have now removed xkcdTree from phytools (source code is still available from the phytools page) and modified fancyTree so that it no longer uses extrafonts. This new version has now been submitted to CRAN. It is already available on phytools.org.

↧

Phylogenetic comparative methods mini-course at Universidad de los Andes, Bogotá

March 4, 2014, 7:39 pm

≫ Next: Drawing a line on a plotted tree to identify members of a clade

≪ Previous: New version of rateshift; new version of phytools submitted to CRAN

Luke Harmon, Andrew Crawford, and I will be co-teaching a phylogeny methods in R workshop at the Universidad de los Andes, Bogotá, Colombia this summer from the 8th to the 11th of July. This course is funded by the NSF and co-sponsored by U. los Andes and the University of Massachusetts Boston (my home institution).

More information in English & Spanish below:

Intensive short course on phylogenetic comparative methods in R (descripción en español abajo)

We are pleased to announce a new graduate-level intensive short course on the use of R for phylogenetic comparative analysis. The course will be four days in length and will take place at the Universidad de los Andes, Bogotá, Colombia from the 8th to the 11th of July, 2014. This course is partially funded by the National Science Foundation, with additional support from the University of Massachusetts Boston and the Universidad de los Andes. There are a number of full stipends available to cover the cost of travel, room and board for qualified students and post-docs. Applicants are welcome from any country; however we expect that most admitted students will come from Colombia and the Andean region. Accepted students from further afield may be offered only partial funding for their travel expenses. Topics covered will include: an introduction to the R programming language, tree manipulation, independent contrasts and phylogenetic generalized least squares, ancestral state reconstruction, models of character evolution, diversification analysis, and community phylogenetic analysis. Course instructors will include Dr. Liam Revell (University of Massachusetts Boston), Dr. Luke Harmon (University of Idaho), and Dr. Andrew J. Crawford (Universidad de los Andes).

Instruction in the course will be primarily in English; however some of the instructors and TAs of the course are competent or fluent in Spanish and English. Discussion, exercises, and activities will be conducted in both languages.

To apply for the course, please submit your CV along with a short (maximum 1 page) description of your research interests, background, and reasons for taking the course. Admission is competitive, and preference will go towards students with background in phylogenetics and a compelling motivation for taking the course. Applications should be submitted by email to bogota.phylogenetics.course@gmail.com by May 1st, 2014. Applications may be written in English or Spanish; however all students must have a basic working knowledge of scientific English. Questions can be directed to liam.revell@umb.edu.

Curso posgrado de métodos comparativos filogenéticos en R

Nos complace anunciar un nuevo curso corto e intensivo a nivel de posgrado sobre el uso de R en investigaciones científicas que usan métodos comparativos filogenéticos. El curso tendrá una duración de cuatro días y se llevará a cabo en la Universidad de los Andes (Bogotá, Colombia) entre el 8 y el 11 de julio de 2014. Este curso está parcialmente financiado por la National Science Foundation de los Estados Unidos, con el apoyo adicional de la Universidad de Massachusetts Boston y la Universidad de los Andes. Hay varios estipendios completos disponibles para cubrir los costos de tiquetes y alojamiento para estudiantes e investigadores postdoctorales calificados. Estudiantes de cualquier país serán aceptados; sin embargo anticipamos que la mayoría de los estudiantes aceptados serán de Colombia y otros países andinos. Estudiantes provenientes de países más alejados tendrán la posibilidad de recibir solo apoyo parcial para costear los gastos del viaje. Los temas que serán discutidos en el curso incluyen: una introducción al idioma de programación de R, manipulación de los árboles filogenéticos, mínimos cuadrados generalizados en un contexto filogenético, reconstrucciones de los estados ancestrales, modelos de evolución, análisis de la diversificación en el contexto de una filogenia, y análisis filogenéticos de comunidades ecológicas. Los instructores del curso serán: Dr. Liam Revell (University of Massachusetts Boston), Dr. Luke Harmon (University of Idaho), y Dr. Andrew J. Crawford (Universidad de los Andes).

El curso será dictado principalmente en inglés; sin embargo algunos de los instructores y TA del curso hablan fluido el español. Las discusiones, los ejercicios, y las actividades del curso se harán en español e inglés.

Para aplicar al curso, deben enviar una copia de su CV con una corta (1 página) descripción de sus intereses científicos, experiencia, y razones por las cuales quieren tomar el curso. El proceso de admisión será competido, y se preferirán estudiantes con conocimientos en filogenética y una motivación persuasiva para hacer el curso. Las aplicaciones deben ser enviadas por email a bogota.phylogenetics.course@gmail.com antes del 1 mayo, 2014. Las aplicaciones pueden ser escritas en inglés o español; sin embargo todos los estudiantes deben tener un nivel básico de inglés científico. Preguntas pueden ser dirigidas a liam.revell@umb.edu.

↧

Drawing a line on a plotted tree to identify members of a clade

March 5, 2014, 2:56 pm

≫ Next: Extracting a "phylo" object from the last plotted tree

≪ Previous: Phylogenetic comparative methods mini-course at Universidad de los Andes, Bogotá

This is in response to a user request to the effect of I would like to place a vertical bar next to all the tips descended for a node that I specify. How do I do this?. Here's my answer:

First, here's our original tree. (Simulated, but generated to look like a realistic empirical tree).

plotTree(tree)
nodelabels()

## ok, let's say our node of interest is node 43
tree<-reorder(tree)
## get the tips numbers & thus their vertical
## positions
tips<-getDescendants(tree,43)
tips<-tips[tips<=length(tree$tip.label)]
## we're going to use these later!
## they assume cex=1, but could be adjusted otherwise
sw<-max(strwidth(tree$tip.label[tips]))
sh<-max(strheight(tree$tip.label))
## max height of the tips in our clade
## + their labels
h<-max(sapply(tips,function(x,tree)
nodeHeights(tree)[which(tree$edge[,2]==x),2],
tree=tree))+1.1*sw
## vertical line
lines(c(h,h),range(tips)+c(-sh,sh))
## upper & lower horizontal lines to demarcate the
## clade; their specific length is arbitrary
lines(c(h-0.5*sw,h),
c(range(tips)[1]-sh,range(tips)[1]-sh))
lines(c(h-0.5*sw,h),
c(range(tips)[2]+sh,range(tips)[2]+sh))
## label for the clade
text(h+0.5*strwidth("W"),mean(range(tips)),
"clade of interest name",srt=90,pos=1)

That's pretty much it. Of course, the user should adapt this code to their specific tree and visualization goals.

↧

Extracting a "phylo" object from the last plotted tree

March 5, 2014, 7:35 pm

≫ Next: New function cladelabels

≪ Previous: Drawing a line on a plotted tree to identify members of a clade

I stumbled on this trick while working on something else. When plot.phylo in ape or plotTree or plotSimmap in phytools are used to plot a tree the environmental variable last_plot.phylo is created. This variable is used to by nodelabels, tiplabels, and other functions that are used to add elements to the plotted tree. This object contains mostly information about the plotted tree - coordinates of vertices, etc. Today I realized that (almost) the entire "phylo" object can be reconstructed from this variable.

Here's how that works:

> tree<-pbtree(n=26,tip.label=LETTERS)
> plotTree(tree)

> ## delete the tree from memory
> rm(tree)
> ## last plotted tree
> lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
> ## reconstruct the tree in memory
> tree<-list(edge=lastPP$edge,
tip.label=1:lastPP$Ntip,
Nnode=lastPP$Nnode)
> ## now get the edge lengths
> H<-matrix(lastPP$xx[tree$edge],nrow(tree$edge),2)
> tree$edge.length<-H[,2]-H[,1]
> class(tree)<-"phylo"
> plotTree(tree)

The only thing we've lost is the tip labels.

That's it on this.

↧