Today a phytools user emailed with the following question:
"I am still having difficulty running an analysis on the evolution of a continuous character with a partial data set (I have character states for about 50% of the taxa). I would like to display the results using contMap, but since fastAnc is built into the code, it won't map the character because the # of taxa in the data set doesn't match the # of taxa in the tree. I attempted to replace fastAnc with the modified version of anc.ML from one of your blog posts (link), but I couldn't get that to work either. I received the same error message as with fastAnc:
Error in ace(x, a, method = "pic") : length of phenotypic and of phylogenetic data do not match
Is there any way to use contMap with missing data? Any help would be greatly appreciated. Thank you for your time."
They are totally on the right track, but things get a little trickier than they might have anticipated because whereas fastAnc (when used in it's default mode) just returns a vector of estimated character values at nodes, anc.ML (when some taxa are present in the tree but missing from the input data) returns a list with the vector missing.x containing the MLE phenotypic trait values of taxa with missing data. It also happens to be the case that I never got around to adding this updated version of anc.ML to the phytools package.
Well, I have now made both updates so it is possible to (1) specify an inference method (method="fastAnc" or method="anc.ML"); and (2) phytools function anc.ML now permits missing data, in which case the states for the tip taxa in the tree missing from the data vector will be estimated using ML. The code for these updates are here: anc.ML, contMap, but since both use internal functions not in the phytools namespace it is probably wise to download the latest phytools build and install from source.
Here's a quick demo:
[1] ‘0.4.7’
> ## simulate tree & data
> tree<-pbtree(n=26,tip.label=LETTERS)
> x<-fastBM(tree)
> ## compute limits on x to use in both plots
> lims<-c(floor(min(x)*10)/10,ceiling(max(x)*10)/10)
> ## simulate missing tip values
> y<-sample(x,20)
> ## which taxa are missing
> setdiff(tree$tip.label,names(y))
[1] "H""I""O""T""W""Z"
> ## contMap from full dataset
> contMap(tree,x,lims=lims,legend=1.5)
> contMap(tree,y,lims=lims,method="anc.ML",legend=1.5)
OK - that's it.