Quantcast
Channel: Phylogenetic Tools for Comparative Biology
Viewing all 800 articles
Browse latest View live

New major version of phytools 0.2-50

$
0
0

I just submitted a new major phytools release to CRAN. If accepted, it should be posted to the phytools CRAN page within a couple of days & then gradually percolate through all the CRAN mirrors.

Here's a sample of some of the updates relative to the last major release of phytools:

1. A new version of plotSimmap that uses colors from palette() if none are provided.

2. Two new functions to compute & use the Strahler numbers of trees & nodes (1, 2).

3. A new version of estDiversity, with a new method that uses stochastic character mapping (1, 2).

4. A couple of fixes & updates to make.simmap (e.g., 1, 2).

5. A new, totally rewritten version of reroot, a function to re-root a tree along any internal or terminal edge.

6. A new function ltt95 to compute the (1-α)% confidence interval around a lineage-through-time plot from a set of trees.

7. A bunch of cool updates to pbtree including: simulating in discrete or continuous-time; simulating with taxa or time stops or both (by two different methods: 1, 2); and simulating deaths as well as births.

8. An updated version of anc.ML, that should work much better for large trees.

9. A new method for conducting marginal ancestral state reconstruction of discrete characters when tip values are uncertain.

It may be a few days before this update is available on CRAN, but interested users can download & install from source from the phytools page. Please let me know of any bugs or issues!


New, much improved version of read.simmap

$
0
0

An issue with the phytools function read.simmap that was identified a long time ago is that, for some heretofore unclear reason, computation time rises more than linearly with the number of trees in the input file.

Well, this finally got annoying enough for me to look into it. What I ultimately discovered surprised me. Basically, it seems to come down to using a for loop around the code that parses the text string into a "phylo" object, rather than a function in the apply family (in this case, lapply). The reason I programmed it this way originally was historical rather than practical in nature - and I wouldn't do it this way today; however it had not occurred to me that this might be the fatal flaw that underlay the weird performance results for the function. Before identifying this fix, I made a number of different updates that improved the code significantly - however none of these fixed the non-linear performance issue until I re-wrote with lapply standing in where once there had been for.

I decided to do a bunch of other things to overhaul the function, and the new version is here. I've also posted a new phytools build (phytools 0.2-52) with this update.

The performance improvement is something else. Here's a quick demo with a 1000-tree dataset in the file 1000tree.nex. First - the latest CRAN build of phytools:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.50’
> system.time(AA<-read.simmap("1000trees.nex",format="nexus", version=1.0))
  user  system elapsed
 928.19    1.03  941.15

(Yes, that did take 15 minutes!!) Now, let's try the new version (phytools 0.2-52):

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.52’
> system.time(BB<-read.simmap("1000trees.nex",format="nexus", version=1.0))
  user  system elapsed
  33.38    0.24  34.52
Ridiculous!
> layout(matrix(1:2,1,2))
> plotSimmap(AA[[1]],pts=F,lwd=3,ftype="off")
no colors provided. using the following legend:
      0      1
"black"  "red"
> plotSimmap(BB[[1]],pts=F,lwd=3,ftype="off")
no colors provided. using the following legend:
      0      1
"black"  "red"
You get the idea.

This is a new, major overhaul - so please report any issues so that I can fix them ASAP.

More performance testing on make.simmap

$
0
0

I wanted to repeat & elaborate some performance testing that I conducted earlier in the month. This is partly motivated by the fact that I introduced and then fixed a bug in the function since conducting this test; and partly this is driven by some efforts to figure out why the stand-alone program SIMMAP can give deceptively misleading results when the default priors are used (probably because the default priors should not be used - more on this later).

My simulation test was as follows:

1. I simulated a stochastic phylogeny (using pbtree) and character history for a binary trait with states a and b (using sim.history) conditioned on a fixed, symmetric transition matrix Q. For these analyses, I simulated a tree with 100 taxa of total length 1.0 with backward & forward rate of 1.0.

2. I counted the true number of transitions from a to b&b to a; the true total number of changes; and the true fraction of time spent in state a (for any tree, the time spent in b will just be 1.0 minus this).

3. I generated 200 stochastic character maps using make.simmap. For more information about stochastic mapping or make.simmap search my blog & look at appropriate references.

4. From the stochastic maps, I computed the mean number of transitions of each type; the mean total number of changes; and the mean time spent in state a. I evaluated whether the 95% CI for each of these variables included the true values from 1.

5. I repeated 1. through 4. 200 times.

First, we can ask how often the 95% CI includes the generating values. Ideally, this should be 0.95 of the time:

> colMeans(on95)
   a,b    b,a      N time.a
 0.905  0.895  0.790  0.950
This is not too bad.

Now let's attach some visuals to parameter estimation. Figure 1 shows scatterplots of the relationship between the true and estimated (averaged across 200 stochastic maps) values of each of the four metrics described above:

Figure 1:

Figure 2 is a clearer picture of bias. It gives the distribution of Y - &Ycirc;, where &Ycirc; is just the mean from the stochastic maps for a given tree & dataset. The vertical dashed line gives the expectation (zero) if our method is unbiased; whereas the vertical solid line is the mean of our sample.

Figure 2:

That's it. I'm not posting the code - because there's a whole bunch of it; however if you want it, please just let me know.

New test "full" version of make.simmap

$
0
0

I just posted a new version of make.simmap that samples stochastic histories not only conditioned on the most likely value of Q, but that can also use a fixed value of Q, or can sample Q from its posterior probability distribution conditioned on the substitution model. Pretty cool!

The motivation underlying this is that prior versions of make.simmap sampled stochastic histories by fixing Q. This is empirical Bayesian because we are fixing one level of the model - in this case, the transition matrix - at its most likely value given the data, rather than sampling from the posterior distribution. This should be unbiased - but it will cause the variance on estimated parameters to be too low. We can see this in the simulation I posted recently - although the effect is small when the tree & tip data contain lots of information about the evolutionary transition rates. By fixing Q we circumvent the (perhaps) challenging tasks of specifying prior densities.

Well, this empirical Bayes method remains available as an option in make.simmap (in fact, as the default - because I think it makes sense for most users) - but now users can also sample Q, rather than merely assuming a fixed value. The way this works is as follows:

1. For each tree, we obtain nsim samples from the posterior distribution by nsim× samplefreq generations of MCMC (after burnin). We also keep the conditional likelihoods from the pruning algorithm from each value of Q in this step.

2. For each value of Q& set of conditional likelihoods, we simulate a set of node states and stochastic map up the tree.

make.simmap uses a γ prior probability distribution for the rates with α &β that can be specified by the user. By default, α = β = 1.0, but this will probably not be a good prior density for your dataset.

make.simmap also offers the option of allowing the data to inform the prior distribution. This is accomplished using make.simmap(...,prior= list(use.empirical=TRUE)). In this case, only β from the specified γ distribution will be used - and α will be set at α = β &times ML(Q).

Let's experiment with the function:

> packageVersion("phytools")
[1] ‘0.2.53’
> require(phytools)
Loading required package: phytools
> # simulate data & tree > Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-letters[1:2]
> tree<-pbtree(n=100,scale=1)
> tree<-sim.history(tree,Q)
> x<-tree$states
> # describe.simmap on true history
> AA<-describe.simmap(tree,message=FALSE)
> # perform stochastic mappings using the empirical
> # priors method
> mtrees<-make.simmap(tree,tree$states,nsim=100, model="ER",Q="mcmc",prior=list(beta=3,use.empirical=TRUE))
Running MCMC burn-in. Please wait....
Running 10000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
         a         b
a -1.009082  1.009082
b  1.009082 -1.009082
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.
> # describe.simmap on stochastic maps
> XX<-describe.simmap(mtrees,message=FALSE)

Here's a summary of the results:

Now let's compare to the empirical Bayes method in which we fix Q at its most likely value:

> mtrees<-make.simmap(tree,x,nsim=100,model="ER", Q="empirical")
make.simmap is sampling character histories conditioned on the transition matrix
Q =
           a          b
a -0.9704866  0.9704866
b  0.9704866 -0.9704866
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
 a   b
0.5 0.5
Done.
> # describe.simmap on stochastic maps
> YY<-describe.simmap(mtrees,message=FALSE)

Some results:

As we might guess - there is more variability when we integrate over the full posterior probability distribution, and this seems to be borne out by our example. We can also compare the posterior probabilities at internal nodes across the two methods. Here's what that looks like:

Again, we're seeing more or less what we'd expect - specifically, more uncertainty in the posterior probabilities from the full model in nodes that have posterior probabilities near zero or one in the empirical Bayesian method.

Cool.

This is just a test version - who knows how many bugs will be found before I'm done; however source code is available here, or (if you'd preferred) it is in a new beta build of phytools (phytools 0.2-53) on my website.

Method comparison in make.simmap

$
0
0

A couple of days ago I introduced a new version of make.simmap that uses Bayesian MCMC to sample the transition matrix, Q, from its posterior probability distribution rather than fixing it at the MLE (as in previous versions of the function). The idea is that by fixing Q we are ignoring variation in the reconstructed evolutionary process that is due to uncertainty in the transition matrix.

I thought that, as a first pass, it might be neat to compare these two approaches: one in which we fix Q at its most likely value; and the other in which Q is sampled from its posterior probability distribution. My expectation is relatively straightforward - variance on estimated variables should go up (hopefully towards their true variance), and the probability that the generating parameter values should be on their corresponding (1-α)% CIs (which, in general, included less than (1-α)% of generating values in the empirical Bayes method) should go to (1-α).

My analysis was as follows:

1. First, I simulated 200 pure-birth trees containing 100 taxa using the phytools function pbtree. On each tree I simulated the stochastic history of a binary discrete character with two states (a&b) given a symmetric transition matrix with backward & forward rate Qa,b = Qb,a = 1.0.

2. For each tree, I computed the number of transitions of each type and of any type; and the fraction of time spent in state a vs. b.

3. Next, for each tree I sampled 200 stochastic character maps conditioning on the MLE transition matrix (Q="empirical").

4. For each set of stochastic maps, I computed the number and type of transitions as well as the time spent in state a and averaged them across maps. I also obtained 95% credible intervals on each variable from the sample of stochastic maps, in which the 95% CI is the interval defined by the 2.5th to 97.5th percentiles of the sample. I calculated the total fraction of datasets in which the true values fell on the 95% CI from stochastic mapping.

5. Finally, I repeated 1. through 4. but this time stochastic maps were obtained by first sampling 200 values of Q from its Bayesian posterior distribution using MCMC (Q="mcmc"). In this case, I used 1,000 generations of burn-in followed by 20,000 generations of MCMC, sampling every 100 generations. I used a γ prior probability distribution with make.simmap(...,prior(use.empirical=TRUE, beta=2)), which means that the parameters of the prior distribution were β = 2 and α = MLE(Q) × β. The figure below shows the γ prior probability density for α = β = 2.

Here are some of the results. First let's look at the empirical Bayes method in which Q is set to its most likely value:

> ci<-colMeans(on95); ci
   a,b    b,a      N time.a
 0.895  0.925  0.800  0.970
> pbinom(ci*200,200,0.95)
       a,b        b,a          N     time.a
0.00115991 0.07813442 0.00000000 0.93765750
This shows that - although the method is not too bad, the 95% CI coverage is significantly below the nominal level - at least for the number of changes from a to b as well as the total number of changes on the tree. Now let's see a visual summary of the results across simulations:

Figure 1
Figure 2

Figure 1 shows a set of scatterplots with the true&estimated values of each of the four variables described earlier. Figure 2 is just a different visualization of the same data - here, though, we have the frequency distribution, from 200 stochastic maps, of the difference between the estimated and generating values.

OK, now let's compare to the full* Bayesian version in which MCMC is used to sample Q from its posterior probability distribution.

(*Note that in a true fully Bayesian method I should not have used my empirical data to parameterize the prior distribution, which I have done here by using prior=list(...,use.empirical=TRUE), but let's ignore that for the time being.)

First, the 95% CI, as before:

> ci<-colMeans(on95); ci
   a,b    b,a      N time.a
 0.970  0.925  0.955  0.940
> pbinom(ci*200,200,0.95)
       a,b        b,a          N     time.a
0.93765750 0.07813442 0.67297554 0.30024430
Neat. This time our (1-α)% CIs include the true values of our 4 variables about (1-α)% of the time; and in no case is this significantly lower than we'd expect by chance.

Here are the same two visualizations as for the empirical Bayes method, above:

Figure 3

Figure 4

Overall, this shows us that integrating over uncertainty in Q - at least in this simple case - does have the desired effect of expanding our (1-α)% credible interval to include the generating values of our variables in (1-α)% of simulations. Cool.

Unfortunately, make.simmap(...,Q="mcmc") is very slow. I have some ideas about a few simple ways to speed it up and I will work on these things soon.

Stochastic mapping with a bad informative prior results in parsimony reconstruction

$
0
0

The title to this post pretty much says it all. As alluded to in the discussion thread of an earlier post, if you use an informative prior probability distribution in which the mean is too low - then the result can (especially in the limiting case as the mean goes to zero) become the parsimony reconstruction. That is, as our prior probability distribution forces our posterior distribution of Q to be closer & closer to zero, all reconstructions converge to the one or multiple most parsimonious reconstructions.

Here's a quick demonstration of what I mean. Just to make the point clear, I have used a very exaggerated bad prior probability density - the γ with α = 2 &β = 200 - which basically says we have a high degree of confidence that the rate of evolution is very low (with mean α/β = 0.01).

> library(phytools)
Attaching package: ‘phytools’

> # first create a mapped history
> Q<-matrix(c(-2,2,2,-2),2,2)
> colnames(Q)<-rownames(Q)<-LETTERS[1:2]
> tree<-sim.history(pbtree(n=100,scale=1),Q,anc="A")
> describe.simmap(tree)
1 tree with a mapped discrete character with states:
A, B

tree has 54 changes between states

changes are of the following types:
   A  B
A  0 18
B 36  0

mean total time spent in each state is:
             A         B    total
raw  12.897944 15.049629 27.94757
prop  0.461505  0.538495  1.00000

> x<-tree$states
> # now let's perform stochastic mapping with a
> # reasonable prior
> emp.prior<-make.simmap(tree,x,nsim=100,Q="mcmc", prior=list(beta=2,use.empirical=TRUE))
Running MCMC burn-in. Please wait....
Running 10000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
          A        B
A -1.943666  1.943666
B  1.943666 -1.943666
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.

> # now a prior that is way too low!
> low.prior<-make.simmap(tree,x,nsim=100,Q="mcmc", prior=list(alpha=2,beta=200))
Running MCMC burn-in. Please wait....
Running 10000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
          A          B
A -0.1103515  0.1103515
B  0.1103515 -0.1103515
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.

> # plot all three for comparison to the true tree
> layout(matrix(c(1,2,3),1,3))
> plotSimmap(tree,pts=F,lwd=2,ftype="off")
no colors provided. using the following legend:
      A      B
"black"  "red"
> plotSimmap(emp.prior[[1]],pts=F,lwd=2,ftype="off")
...
> plotSimmap(low.prior[[1]],pts=F,lwd=2,ftype="off")
...

What's dangerous about the reconstructions shown above is that (without the known true history, as in this case) it may be the rightmost reconstruction - the one obtained with the faulty prior - that looks most reasonable!

OK - let's do a little more analysis to further the point:

> describe.simmap(emp.prior)
100 trees with a mapped discrete character with states:
A, B

trees have 60.09 changes between states on average

changes are of the following types:
      A,B  B,A
x->y 29.52 30.57

mean total time spent in each state is:
              A          B    total
raw  15.7298487 12.2177248 27.94757
prop  0.5628341  0.4371659  1.00000

> describe.simmap(low.prior)
100 trees with a mapped discrete character with states:
A, B

trees have 22.97 changes between states on average

changes are of the following types:
      A,B  B,A
x->y 14.44 8.53

mean total time spent in each state is:
              A          B    total
raw  17.7439790 10.2035945 27.94757
prop  0.6349023  0.3650977  1.00000

> # get parsimony score
> library(phangorn)
> X<-phyDat(as.matrix(x),type="USER",levels=LETTERS[1:2])
> parsimony(tree,X)
[1] 22

Of course, we can push it even further & virtually guarantee that our stochastic maps have the MP number of changes, e.g.:

> really.low.prior<-make.simmap(tree,x,nsim=20,Q="mcmc", prior=list(alpha=1,beta=1e4))
Running MCMC burn-in. Please wait....
Running 2000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
            A            B
A -0.002198051  0.002198051
B  0.002198051 -0.002198051
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.
> describe.simmap(really.low.prior)
20 trees with a mapped discrete character with states:
A, B

trees have 22.05 changes between states on average

changes are of the following types:
      A,B B,A
x->y 13.65 8.4

mean total time spent in each state is:
              A          B    total
raw  17.4916200 10.4559534 27.94757
prop  0.6258726  0.3741274  1.00000

Well, you get the idea.

2x faster version of make.simmap for Q="mcmc"

$
0
0

In a previous post I promised that I could speed up make.simmap(..., Q="mcmc"), i.e., the full Bayesian MCMC make.simmap method. (Matt Pennell said he didn't care, which is very generous of him.)

There was one really low-hanging fruit which is that during Bayesian MCMC sampling of the transition matrix, Q, make.simmap was unnecessarily computing the likelihood twice for each iteration of the MCMC. This is because make.simmap uses modified code from ape's ace function internally to perform Felsenstein's pruning algorithm in computing the likelihood of Q as well as well as the conditional scaled likelihoods at each node. Normally, ace returns the likelihood & the scaled conditional likelihoods of the subtrees at each internal node by first maximizing the likelihood using the optimizer nlminb, and the computing the conditional likelihoods given MLE(Q). I modified the code to take a fixed value of Q, however it still returned both the overall log-likelihood & the conditional likelihoods via two rounds of the pruning algorithm. For the vast majority of generations in our MCMC we don't care about the conditional scaled likelihoods (because we're not sampling that generation) so we should only compute the overall log-likelihood which we need to compute the posterior odds ratio and make a decision about whether to retain the updated parameter values or not. Whenever we want to sample a particular value for Q (i.e., every samplefreq generations), we can compute the conditional likelihoods for all the nodes - since we'll need these later.

This improvement has a pretty dramatic effect on computation time - pretty much halving it as you might expect. Here's a demo using my not-too-fast i5 desktop:

> # simulate tree & data
> Q<-matrix(c(-1,1,1,-1),2,2)
> colnames(Q)<-rownames(Q)<-LETTERS[1:2]
> tree<-sim.history(pbtree(n=100,scale=1),Q)
> x<-tree$states
> # number of changes, etc., on the true tree
> describe.simmap(tree)
1 tree with a mapped discrete character with states:
 A, B

tree has 21 changes between states

changes are of the following types:
  A  B
A  0 11
B 10  0

mean total time spent in each state is:
              A          B    total
raw  11.5075650 10.5967548 22.10432
prop  0.5206025  0.4793975  1.00000

> # OK, now let's run make.simmap & time it
> system.time(mtrees.old<-make.simmap(tree,x,Q="mcmc", nsim=200,prior=list(beta=2,use.empirical=TRUE)))
Running MCMC burn-in. Please wait....
Running 20000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
          A        B
A -1.205146  1.205146
B  1.205146 -1.205146
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.
  user  system elapsed
398.44    0.23  401.99
> # pretty slow!!
> describe.simmap(mtrees.old)
200 trees with a mapped discrete character with states:
 A, B

trees have 29.93 changes between states on average

changes are of the following types:
        A,B    B,A
x->y 15.055 14.875

mean total time spent in each state is:
              A          B    total
raw  11.3629822 10.7413376 22.10432
prop  0.5140616  0.4859384  1.00000

OK - now let's compare to the updated version:

> source("make.simmap.R")
> system.time(mtrees.new<-make.simmap(tree,x,Q="mcmc", nsim=200,prior=list(beta=2,use.empirical=TRUE)))
Running MCMC burn-in. Please wait....
Running 20000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
          A        B
A -1.227405  1.227405
B  1.227405 -1.227405
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.
  user  system elapsed
214.87    0.03  216.36
> # still slow, but better
> describe.simmap(mtrees.new)
200 trees with a mapped discrete character with states:
 A, B

trees have 29.975 changes between states on average

changes are of the following types:
        A,B  B,A
x->y 14.585 15.39

mean total time spent in each state is:
              A          B    total
raw  11.1900994 10.9142203 22.10432
prop  0.5062404  0.4937596  1.00000

The code for this version is here, but I have also posted a new phytools build (phytools 0.2-54). Check it out & please report any issues or bugs.

New article describing new graphical methods: densityMap, contMap, and fancyTree(...,type="phenogram95")

$
0
0

My new article describing three new graphical methods for plotting comparative data on trees, implemented in the phytools functions densityMap (1, 2, 3, 4), contMap (1, 2, 3), and fancyTree(..., type="phenogram95") just came out "Accepted" (i.e., manuscript pages, not type-set or proofed) in Methods in Ecology & Evolution. The article is entitled Two new graphical methods for mapping trait evolution on phylogenies, which might seem like a bit of a misnomer (or perhaps a tribute to this article) given my description, above, but I think it makes sense.

The figure at right is from the data (click for a larger version with tip labels). Thanks to Dave Collar for helpfully providing the published data & tree.


Showing posterior probabilities from describe.simmap at the nodes of a plotted tree

$
0
0

A user comment asks the following:

"I wonder how can I alter the size of the "pie" in describe.simmap.... Same question for altering the color...."

Well, there is no way at present to change the color or pie-size in describe.simmap(...,plot=TRUE); however, describe.simmap, it is easy to recreate the posterior probability plot of describe.simmap using the function output - while customizing its attributes.

Here's a quick demo:

> # check package version
> packageVersion("phytools")
[1] ‘0.2.54’
>
> # first simulate some data to use in the demo
> Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
> rownames(Q)<-colnames(Q)<-letters[1:3]
> tree<-sim.history(pbtree(n=50,scale=1),Q)
> x<-tree$states
>
> # now the demo
> mtrees<-make.simmap(tree,x,model="ER",nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
          a        b        c
a -2.041867  1.020934  1.020934
b  1.020934 -2.041867  1.020934
c  1.020934  1.020934 -2.041867
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
        a        b        c
0.3333333 0.3333333 0.3333333
Done.
> XX<-describe.simmap(mtrees,plot=FALSE)
100 trees with a mapped discrete character with states:
a, b, c

trees have 23.93 changes between states on average

changes are of the following types:
      a,b  a,c  b,a  b,c  c,a  c,b
x->y 2.11 2.91 4.62 5.77 4.94 3.58

mean total time spent in each state is:
            a        b        c    total
raw  2.2816521 4.1930936 4.2593925 10.73414
prop 0.2125603 0.3906316 0.3968081  1.00000

> hh<-max(nodeHeights(tree))*0.02 # for label offset
> plot(tree,no.margin=TRUE,label.offset=hh,edge.width=2)
> nodelabels(pie=XX$ace,piecol=c("blue","red","green"), cex=0.5)
> tiplabels(pie=to.matrix(x,colnames(XX$ace)), piecol=c("blue","red","green"),cex=0.5)

Obviously, to adjust the pie-sizes & colors we just change the values of cex&piecol.

That's it.

Bug fix in fastAnc

$
0
0

A couple of days ago a phytools user reported a bug in geomorph (which reverse depends on phytools) that seemed to be due to a problem with fastAnc, which is used internally.

Well, I finally got around to looking into it. It turns out that there was a bug in fastAnc, but I'd overlooked it for good reason - it only occurs under the somewhat idiosyncratic circumstances of when is.binary.tree(tree)=FALSEand the user-specified option vars=FALSE. This is because fastAnc works by re-rooting the tree at all internal nodes of a binary tree and computing the PIC ancestral state & variance. If the input tree is not binary, then it uses multi2di, but then has to back-translate to the original tree which it does using phytools matchNodes.

The problem arose because:

if(!is.binary.tree(tree)){
  ancNames<-matchNodes(tree,btree)
  anc<-anc[as.character(ancNames[,2])]
  names(anc)<-ancNames[,1]
  if(vars) v[as.character(ancNames[,2])]
  names(v)<-ancNames[,1]
}
should have been:
if(!is.binary.tree(tree)){
  ancNames<-matchNodes(tree,btree)
  anc<-anc[as.character(ancNames[,2])]
  names(anc)<-ancNames[,1]
  if(vars||CI){
    v[as.character(ancNames[,2])]
    names(v)<-ancNames[,1]
  }
}
in the code that is executed to back-translate nodes betweeen trees. Basically, if is.binary.tree(tree)=FALSE and vars=FALSE (but under no other circumstances), fastAnc will try to assign names to a vector of zero length. Oops.

The fixed source code for this very simple function is here, but I also posted a new phytools build (phytools 0.2-55).

Huge speed-up for rerootingMethod & estDiversity

$
0
0

A couple of months ago I posted an extremely simple function - basically a wrapper for the 'ape' function ace - to do marginal ancestral state reconstruction using the re-rooting method of Yang.

Well - this function is quite slow. This is partly because the tree has to be re-rooted at every internal node; but mostly this is slow for a totally unnecessary reason, and that is that by wrapping around ace (or, rather, a very lightly modified version of ace used internally by phytools), at each re-rooting the function also re-estimates the transition matrix, Q. Obviously - since only symmetric transition matrices are permitted by this method - this is totally unnecessary.

This is now fixed in the latest minor phytools build (phytools 0.2-56) and the result is an enormous speed-up in computation time. So, for instance:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.55’
>
> # simulate
> tree<-pbtree(n=100,scale=1)
> Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
> rownames(Q)<-colnames(Q)<-letters[1:3]
> x<-sim.history(tree,Q)$states
>
> # ok, now estimate using the old version
> system.time(XX<-rerootingMethod(tree,x))
  user  system elapsed
  28.34    0.00  28.42
> # unload phytools and install the new version
> detach("package:phytools",unload=TRUE)
> install.packages("phytools_0.2-56.tar.gz",type="source", repos=NULL)
* installing *source* package 'phytools' ...
** R
...
* DONE (phytools)
> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.56’
>
> # ok, now repeate the analysis using the new version
> system.time(YY<-rerootingMethod(tree,x))
  user  system elapsed
  3.36    0.01    3.38
> plot(XX$marginal.anc,YY$marginal.anc,xlab="marginal ASRs old version",ylab="marginal ASRs new version")

Cool.

The same speed-up can also be applied to estDiversity - which estimates historical lineage diversity at all the nodes of the tree based on the approach of Mahler et al. (2010) (e.g., 1, 2). So, for instance:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.55’
>
> system.time(d.old<-estDiversity(tree,x))
Please wait. . . . Warning - this may take a while!
Completed 10 nodes
Completed 20 nodes
Completed 30 nodes
Completed 40 nodes
  user  system elapsed
 228.64    0.14  236.39
>
> detach("package:phytools",unload=TRUE)
> install.packages("phytools_0.2-56.tar.gz",type="source", repos=NULL)
...
* DONE (phytools)
> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.56’
>
> system.time(d.new<-estDiversity(tree,x))
Please wait. . . . Warning - this may take a while!
Completed 10 nodes
Completed 20 nodes
Completed 30 nodes
Completed 40 nodes
  user  system elapsed
  26.02    0.04  26.41
> plot(d.old,d.new,xlab="estimated historical diversity (old)",ylab="estimated historical diversity (new)")

Wow. That's an enormous difference. Cool.

Faster version of getStates, but why is it faster...?

$
0
0

I was tinkering with the function describe.simmap to try and speed it up, when I discovered that the very simple function getStates, which does nothing more than get the states on a mapped tree in memory for the nodes or tips, had run-time using lapply to iterate over the trees in an object of class "multiPhylo", that seems to rise non-linearly with the number of trees.

> ## simulate tree & data
> tree<-pbtree(n=200,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> colnames(Q)<-rownames(Q)<-letters[1:2]
> trees<-sim.history(tree,Q,nsim=200)
> # now lets get the states for all trees at all nodes
> # using sapply
> system.time(XX1<-getStates(trees[[1]]))
  user  system elapsed
  0.02    0.00    0.02
> system.time(XX10<-sapply(trees[1:10],getStates))
  user  system elapsed
  0.05    0.00    0.04
> system.time(XX50<-sapply(trees[1:50],getStates))
  user  system elapsed
  0.28    0.00    0.29
> system.time(XX100<-sapply(trees[1:100],getStates))
  user  system elapsed
  1.16    0.00    1.16
> system.time(XX200<-sapply(trees,getStates))
  user  system elapsed
    6.6    0.0    6.6

Hmmm. What's going on here?

What I discovered (somehow - I'm not sure why I tried this) is that if I first split my list of 200 trees into, say, 20 lists of 10 trees; and then I ran sapply(...,getStates) on each of these lists; then recombined the results using cbind, this is much faster. So, for instance:

> g<-function(trees){
 ff<-as.factor(ceiling(1:length(trees)/10))
 aa<-lapply(split(trees,ff),function(x)   sapply(x,getStates))
 y<-if(length(trees)>10) aa[[1]] else aa
 for(i in 2:length(aa)) y<-cbind(y,aa[[i]])
 y
}
> # now run it
> system.time(YY200<-g(trees))
  user  system elapsed
  0.97    0.00    0.97
> # check all equal
> dim(YY200)
[1] 199 200
> dim(XX200)
[1] 199 200
> all(XX200==YY200)
[1] TRUE

Seriously - what's going on here? (I have some ideas - but I'm not sure. Feedback welcome.)

Here's a new version of getStates with this hack implemented internally:

# function to get node states from simmap style trees
# written by Liam J. Revell 2013
getStates<-function(tree,type=c("nodes","tips")){
  type<-type[1]
  if(class(tree)=="multiPhylo"){
    ff<-as.factor(ceiling(1:length(tree)/10))
    aa<-lapply(split(tree,ff),function(x)
     sapply(x,getStates))
    y<-if(length(tree)>10) aa[[1]] else aa
    for(i in 2:length(aa)) y<-cbind(y,aa[[i]])
  } else if(class(tree)=="phylo"){
    if(type=="nodes"){
      y<-setNames(sapply(tree$maps,function(x)
       names(x)[1]),tree$edge[,1])
      y<-y[as.character(length(tree$tip)+1:tree$Nnode)]
    } else if(type=="tips"){
      y<-setNames(sapply(tree$maps,function(x)
       names(x)[length(x)]),tree$edge[,2])
      y<-setNames(y[as.character(1:length(tree$tip))],
       tree$tip)
    }
  } else stop("tree should be an object of class 'phylo' or 'multiPhylo'")
  return(y)
}

Much faster versions of countSimmap, describe.simmap, and getStates

$
0
0

Yesterday, Klaus Schlieppointed out that a trick to speed up lapply's handling of objects of class "multiPhylo" (i.e., just a simple list of objects of class "phylo") was just to first remove the class attribute "multiPhylo". Why this would have any effect at all is somewhat of a mystery to me. is.list(trees) evaluates TRUE regardless of whether or not the class attribute has been removed, so it doesn't seem that lapply would have to coerce our object to a list in either case. Nonetheless - not only does this work, it works tremendously! I have now included this simple trick in countSimmap, describe.simmap, and getStates, all of which use lapply or sapply if the argument tree is an object of class "multiPhylo".

Here's a demo of just how much of an improvement in speed results from this trick:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.56’
>
> # simulate data
> tree<-pbtree(n=100,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> colnames(Q)<-rownames(Q)<-letters[1:2]
> tree<-sim.history(tree,Q)
> x<-tree$states
>
> # stochastic mapping
> mtrees<-make.simmap(tree,x,nsim=1000)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
          a          b
a -0.8686634  0.8686634
b  0.8686634 -0.8686634
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
  a  b
0.5 0.5
Done.
>
> # ok, now let's time describe.simmap for
> # various subsets of our mapped trees
> system.time(X100<-describe.simmap(mtrees[1:100], message=FALSE))
  user  system elapsed
  2.69    0.02    2.70
> system.time(X200<-describe.simmap(mtrees[1:200], message=FALSE))
  user  system elapsed
  12.71    0.02  12.75
> system.time(X400<-describe.simmap(mtrees[1:400], message=FALSE))
  user  system elapsed
  74.24    0.64  75.57

Woah. I'm not even going to try the full set of 1,000 trees.

OK, now let's compare to describe.simmap with nothing more than the trick suggested by Klaus (i.e., unclass-ing the "multiPhylo" object for every use of lapply or sapply):

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.58’
> system.time(Y100<-describe.simmap(mtrees[1:100], message=FALSE))
  user  system elapsed
  0.45    0.00    0.45
> system.time(Y200<-describe.simmap(mtrees[1:200], message=FALSE))
  user  system elapsed
  0.83    0.00    0.82
> system.time(Y400<-describe.simmap(mtrees[1:400], message=FALSE))
  user  system elapsed
  1.78    0.00    1.78
Holy cow! What a huge improvement. We can even now run it on the full set of 1,000 mapped trees:
> par(cex=0.8) # make our tip labels a little smaller
> system.time(YY<-describe.simmap(mtrees,plot=TRUE))
1000 trees with a mapped discrete character with states:
a, b

trees have 29.633 changes between states on average

changes are of the following types:
        a,b    b,a
x->y 10.692 18.941

mean total time spent in each state is:
              a          b    total
raw  12.9314025 18.3673136 31.29872
prop  0.4131608  0.5868392  1.00000

  user  system elapsed
  4.65    0.06    4.71

Wow.

Bug fix & update to phylomorphospace

$
0
0

Yesterday a phytools user identified a bug in the node-coloring of phylomorphospace. I thought that this was likely introduced when I recently did a major rewrite of the function (described here) and this seems to be correct. I have fixed this, and also added a new feature to the function that (for trees with a mapped discrete character) will automatically color nodes using the color of the mapped discrete character. The code of the updated function is here; and I have also posted a new phytools build (phytools 0.2-59).

Here's a demo of the fixed node coloring:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.59’
> # first let's simulate a tree & data
> tree<-pbtree(n=30,scale=1)
> XX<-fastBM(tree,nsim=2)
> plotTree(tree,node.numbers=T)
> # now let's say we want to plot nodes
> # descended from "49" red:
> cols<-rep("black",length(tree$tip.label)+tree$Nnode)
> names(cols)<-1:length(cols)
> cols[getDescendants(tree,49)]<-"red"
> # and everything from "40" blue:
> cols[getDescendants(tree,40)]<-"blue"
> # finally, these can even be nested
> cols[getDescendants(tree,44)]<-"yellow"
> # and plot
> phylomorphospace(tree,XX,xlab="X1",ylab="X2", control=list(col.node=cols))
Cool. (This is basically the demo that I gave in an earlier post.)

Now, when I was doing this I realized that it might be cool to be able to color the nodes - as well as the edges - according to a mapped discrete character. To do this, we need to be able to compute the colors of all internal nodes on the tree from their states. Here is the code that I used to do that:

zz<-c(getStates(tree,type ="tips"), getStates(tree))
names(zz)[1:length(tree$tip.label)]<-
  sapply(names(zz)[1:length(tree$tip.label)],
  function(x,y) which(y==x),y=tree$tip.label)
con$col.node<-setNames(colors[zz],names(zz))
The first line just computes the states at all tip & internal nodes using getStates; the second is just a complicated way of translating tip labels in names(zz) into node numbers, which is what is need by phylomorphospace; finally, line three translates the node states to colors.

Here is a demo:

> # transition matrix
> Q<-matrix(c(-2,2,2,-2),2,2)
> colnames(Q)<-rownames(Q)<-letters[1:2]
> # simulate stochastic history
> tree<-sim.history(tree,Q)
> phylomorphospace(tree,XX,xlab="X1",ylab="X2", colors=setNames(c("blue","red"),letters[1:2]), node.by.map=TRUE)

That's it.

New version of densityMap

$
0
0

The full title of this post should read New much faster version of densityMap that returns the plotted map invisibly combined with generic plotting method for special mapping object class, or something like that - but that seemed like a tongue twister. densityMap is a function to visualize the posterior sample of maps from a stochastic mapping analysis and is described in an article that I have in press at Methods in Ecology & Evolution.

I just posted a new version of densityMap (code here) that simultaneously addresses a couple of significant issues with this (otherwise pretty cool, in my opinion) function.

Firstly, prior versions of the function are way too slow. The function is still slow - just no longer way too slow. This speed-up was achieved entirely using the trick of removing the class attribute of our "multiPhylo" object which - for some reason that is not entirely clear to me - dramatically speeds up handling of the object both by apply family functions, and even in for loops.

Secondly, what makes densityMapespecially annoying to work with - given that it is slow - is that to alter any of the plotting options, you need to re-compute the aggregate mappings. In the new version of densityMap (and new minor phytools build, phytools 0.2-60), densityMap plots the mapped tree as before, but also returns a special object of class "densityMap" invisibly. This can then be plotted using a call of the generic plot (to which I have added the phytools method plot.densityMap).

Here's a demo:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.60’
> # simulate tree & data
> tree<-pbtree(n=70,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-c(0,1)
> tree<-sim.history(tree,Q)
> x<-tree$states
> # generate stochastic maps
> mtrees<-make.simmap(tree,x,nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
          0          1
0 -0.8857354  0.8857354
1  0.8857354 -0.8857354
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
  0  1
0.5 0.5
Done.
> # generate density map
> # this would have been much slower before
> system.time(map<-densityMap(mtrees))
sorry - this might take a while; please be patient
  user  system elapsed
  7.85    0.12    7.97

Now, in earlier versions of densityMap if we wanted to adjust the way this plot looked (say - so that the labels don't overlap!) - we would have to recompute the aggregate mapping (which, remember, took us 8s here - but much longer in earlier versions or for bigger trees). However, here we have created the object maps, which we can then pass to plot.densityMap with our revised plotting options:

> plot(map,lwd=5,fsize=c(0.7,1),legend=0.4)
for example.

Cool.


Plotting densityMap using grayscale

$
0
0

Travis Ingramcommented that it would be nice to be able to plot density maps using the function densityMap in grayscale. He gave a line of code that could be modified internally to do this; however (as of today!) this is not necessary as we can instead modify our object of class "densityMap" (now returned invisibly by the function) and replot using plot.densityMap. This is what that would look like:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.60’
> # this is just to simulate some data
> tree<-pbtree(n=80,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-c(0,1)
> tree<-sim.history(tree,Q)
> x<-tree$states
> mtrees<-make.simmap(tree,x,nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
        0        1
0 -1.25418  1.25418
1  1.25418 -1.25418
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
  0  1
0.5 0.5
Done.
> # we don't care about this plot
> maps<-densityMap(mtrees,res=500)
sorry - this might take a while; please be patient
> # now let's change our colormap using Travis's code
> maps$cols[]<-grey(seq(1,0,length.out=length(maps$cols)))
> plot(maps,fsize=c(0.6,1),outline=TRUE,lwd=5)

(Click for higher res version.) Cool.

Thanks to Travis for the great suggestion & code.

paintSubTree doesn't actually "paint" the tree (i.e., with colors)

$
0
0

I've been emailing back & forth with a phytools user to try & explain what is done with paintSubTree, plotSimmap, and phylomorphospace with a mapped regime - but I thought some visuals would help & that some other readers (occasional or otherwise) of this blog might find the clarification useful, so I decided to write this quick post.

The first point is that paintSubTree does not paint the tree with colors. The term paint refers to regime painting in Butler & King (2004); i.e., the process of assigning different branches or subtrees to different a priori specified categories or regimes. The states that we choose to assign to these paintings has nothing at all to do with the colors used to visualize the regimes in (for instance) plotSimmap or phylomorphospace, which (if unassigned by the user) are drawn in sequence from palette().

This is what I mean:

> tree<-pbtree(n=30)
> plotTree(tree,node.numbers=TRUE)

Now let's say we want to paint the subtrees arising from nodes 57, 50, 33, and 42 with different regimes. It doesn't matter why we want to do this - maybe we just want to easily visualize the trees with different subtrees in different colors; nor does it matter that the subtree arising out of node 42 is nested within the subtree arising out of node 33. We can do:

> tree<-paintSubTree(tree,node=57,state="1",anc="0")
> tree<-paintSubTree(tree,node=50,state="2")
> tree<-paintSubTree(tree,node=33,state="3")
> tree<-paintSubTree(tree,node=42,state="4")
Now let's plot the tree with default colors:
> plotSimmap(tree,pts=FALSE)
no colors provided. using the following legend:
      0        1        2        3        4
"black"    "red" "green3"  "blue"  "cyan"

If we want to use different colors, we can just do:

> cols<-c("black","red","blue","green","purple")
> names(cols)<-0:4
> plotSimmap(tree,cols,pts=FALSE)

If our sole purpose in painting on regimes was to visualize the tree in this way, we could have used the desired colors as regimes. I.e.,

> tree<-paintSubTree(tree,node=31,state="black")
> tree<-paintSubTree(tree,node=57,state="red")
> tree<-paintSubTree(tree,node=50,state="blue")
> tree<-paintSubTree(tree,node=33,state="green")
> tree<-paintSubTree(tree,node=42,state="purple")
> cols<-colnames(tree$mapped.edge)
> names(cols)<-cols
> plotSimmap(tree,cols,pts=FALSE)
to exactly the same effect.

Visualizing different regimes in a phylomorphospace works according to the same principle. Here's a quick demo of that using the same tree:

> X<-fastBM(tree,nsim=2)
> phylomorphospace(tree,X,colors=cols,node.by.map=TRUE, xlab="x",ylab="y")

That's really all there is to it.

New version of contMap

$
0
0

I just posted a new version of contMap (described here and in my recent MEE article). contMap uses a color scale to map the observed and reconstructed values of a continuous character onto the branches of a phylogeny.

This update just brings contMap into alignment with densityMap in that it (invisibly) returns an object of class "contMap" so that adjusting the plotting parameters can be accomplished without recomputing all the ancestral states.

As I demonstrated on Travis Ingram's recommendation, this - in addition to speeding up re-plotting if we need to change the parameters of our plot - allows us to easily change the palette of colors used to map our trait on the tree. For instance, we can easily change to grayscale. Here I show how we can flip low to high in our color map.

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.61’
> tree<-pbtree(n=30)
> x<-fastBM(tree)
> maps<-contMap(tree,x)

Alright, now let's flip the colors:

> maps$cols[]<-maps$cols[length(maps$cols):1]
> plot(maps)

Pretty cool.

Code is here; new phytools build (0.2-61), here.

New version of rerootingMethod & estimating ancestral states with an ordered model

$
0
0

I just posted a new version of rerootingMethod. This function computes the empirical Bayes posterior probabilities (marginal ancestral state reconstructions) for a discrete character using the re-rooting method of Yang (e.g., Yang 2006). The main functional change in this version is that now the function can take a model index matrix (i.e., a matrix with the same dimensions as Q with integers to specify the different elements to be estimated), instead of just the string "ER" or "SYM". This should allow users to specify any arbitrary symmetric model - not just the full symmetric or equal-rates model.

Our index matrix contains an integer for each rate to be estimated - but values set to zero indicate that the corresponding transition rate should be zero. This caused a little bit of a problem for rerootingMethod because it had the code:

Q<-matrix(YY$rates[YY$index.matrix],ncol(XX),ncol(XX),
 dimnames=list(colnames(XX),colnames(XX)))

This code first creates a vector with the elements of the index matrix; and then coerces that vector into a matrix with dimensions specified by our number of states for the character. The problem is that index matrix elements that are zero will cause the length of the vector to be wrong, so that it cannot be coerced into a matrix with the right dimensions (we're kind of lucky that this is the case - or we might have missed the error). To solve this, I just did:

Q<-matrix(c(0,YY$rates)[YY$index.matrix+1],ncol(XX),
 ncol(XX),dimnames=list(colnames(XX),colnames(XX)))

One of the most common symmetric models other than the equal rates model that we might be interested in is probably the ordered model. This is a model in which we, say, allow transition A&rlarr; B&rlarr; C&rlarr; D, but no other types of transitions are allowed. How can we fit this model? Here's a quick demo in which I simulate under the model & then fit it & estimate ancestral states:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.63’
> tree<-pbtree(n=100,scale=2)
> Q<-matrix(c(-1,1,0,0,1,-2,1,0,0,1,-2,1,0,0,1,-1),4,4)
> colnames(Q)<-rownames(Q)<-LETTERS[1:4]
> Q
   A  B  C  D
A -1  1  0  0
B  1 -2  1  0
C  0  1 -2  1
D  0  0  1 -1
> x<-sim.history(tree,Q)$states
> mm<-Q; diag(mm)<-0
> mm
  A B C D
A 0 1 0 0
B 1 0 1 0
C 0 1 0 1
D 0 0 1 0
> aa<-rerootingMethod(tree,x,model=mm)
> plot(tree,no.margin=TRUE,edge.width=2, show.tip.label=FALSE)
> nodelabels(pie=aa$marginal.anc,piecol=palette()[1:4], cex=0.5)
> tiplabels(pie=to.matrix(x,colnames(aa$marginal.anc)), piecol=palette()[1:4],cex=0.3)

Code for the new version of rerootingMethod is here; and the latest phytools build can be downloaded here.

Function to merge mapped states

$
0
0

I just posted a new function (mergeMappedStates) to merge mapped states on a phylogeny with a discrete character map. This is pretty easy. We just do two passes through all the edges of the tree. In the first pass we rename any mapped state in the old states with the new merged state:

rr<-function(map,oo,nn){
  for(i in 1:length(map)) if(names(map)[i]%in%oo)
    names(map)[i]<-nn
  map
}
maps<-lapply(maps,rr,oo=old.states,nn=new.state)
In the second pass, we join any adjacent map elements in the same (new) merged state:
mm<-function(map){
  if(length(map)>1){
    new.map<-vector()
    j<-1
    new.map[j]<-map[1]
    names(new.map)[j]<-names(map)[1]
    for(i in 2:length(map)){
      if(names(map)[i]==names(map)[i-1]){
        new.map[j]<-map[i]+new.map[j]
        names(new.map)[j]<-names(map)[i]
      } else {
        j<-j+1
        new.map[j]<-map[i]
        names(new.map)[j]<-names(map)[i]
      }
    }
    map<-new.map
  }
  map
}
maps<-lapply(maps,mm)

Here's a quick demo of how it works:

> tree<-pbtree(n=100,scale=1)
> Q
   a  b  c
a -2  1  1
b  1 -2  1
c  1  1 -2
> tree<-sim.history(tree,Q)
> plotSimmap(tree,lwd=3,ftype="off",pts=F)
no colors provided. using the following legend:
      a        b        c
 "black"    "red" "green3"
> merged<-mergeMappedStates(tree,c("a","c"),"ac")
> plotSimmap(merged,lwd=3,ftype="off",pts=F)
no colors provided. using the following legend:
    ac      b
"black"  "red"

Cool - seems to work.

Viewing all 800 articles
Browse latest View live