Some applications of statistical phylogenetics

Abstract

The increasing availability of molecular data means that phylogenetic studies nowadays often use datasets which combine a large number of loci for many different species. This leads to a trade-off. On the one hand more complex models are preferred to account for heterogeneity in evolutionary processes. On the other hand simple models that can answer biological questions of interest that are easy to interpret and can be computed in reasonable time are favoured. This thesis focuses on four cases of phylogenetic analysis which arise from this conflict. - It is shown that edge weight estimates can be non-identifiable if the data are simulated under a mixture model. Even if the underlying process is known the estimation and interpretation may be difficult due to the high variance of the parameters of interest. - Partition models are commonly used to account for heterogeneity in data sets. Novel methods are presented here which allow grouping of genes under similar evolutionary constraints. A data set, containing 14 genes of the chloroplast from 19 anciently diverged species is used to find groups of co-evolving genes. The prospects and limitations of such methods are discussed. - Penalised likelihood estimation is a useful tool for improving the performance of models and allowing for variable selection. A novel approach is presented that uses pairwise dissimilarities to visualise the data as a network. It is further shown how penalised likelihood can be used to decrease the variance of parameter estimates for mixture and partition models, allowing a more reliable analysis. Estimates for the variance and the expected number of parameters of penalised likelihood estimates are derived. - Tree shape statistics are used to describe speciation events in macroevolution. A new tree shape statistic is introduced and the biases of different cluster methods on tree shape statistics are discussed.

Type
Publication
Date