Max Telford has a really nice perspective in today’s Science on the contribution that molecular data has made toward resolving the animal tree of life, it’s definitely worth a read. The perspective focuses on deep relationships among major groups of organisms (e.g. phyla) and discusses some of the surprises that arose from early molecular phylogenetic work (Field et al.‘s landmark analysis of 18s sequences in particular).
I particularly appreciate two parts of Telford’s discussion:
- The mention (albeit very brief, given that this is a paper for a popular audience) about the contributions and real importance of probabilistic evolutionary models to the field. Occasionally, one bumps into an attitude that more data inevitably equals better inference in molecular phylogenetics, but this is only true if the model for “more data” is a reasonable one.
- The discussion about the absolute explosion of taxonomic sampling in molecular datasets over the last two decades. In my mind, this raises a set of fundamental questions about the field as it stands today: What fraction of the tree of life do we ‘know’ (of animals or other clades) and how quickly are we learning? A few years ago, Brad Shaffer and I made an effort to scrape together an estimate within the vertebrates (chosen as a ‘best case’, since they are probably the most heavily studied major clade from a phylogenetic standpoint). I’ve also spent some time playing with potential ways to visualize this accumulation of phylogenetic data. e.g:
This video is a small experiment in visualizing the accumulation of genetic data across the tree of life. Here, I’m using turtles as an example. The tree structure is drawn from the NCBI taxonomy and the dots at the tips are sequences (red indicating mitochondrial, white indicating nuclear). The figures moving around represent authors adding each sequence to the tree. It was made by extracting the metadata from all turtle sequences in GenBank (publication date, molecule type, author, etc.) and then using these data with a repurposed version of Gource (a popular OpenGL-based visualizer for software version control systems) and ffmpeg to produce the animation.