Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Phyloformer: towards fast and accurate phylogeny estimation with self-attention networks

Luca Nesterenko 1 Bastien Boussau 1, 2 Laurent Jacob 3, 1, 4, 5 
2 Le Cocon
PEGASE - Département PEGASE [LBBE]
4 ERABLE - Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale
LBBE - Laboratoire de Biométrie et Biologie Evolutive - UMR 5558, Inria Lyon
5 Baobab
PEGASE - Département PEGASE [LBBE]
Abstract : A bstract An important problem in molecular evolution is that of phylogenetic reconstruction, that is, given a set of sequences descending from a common ancestor, the reconstruction of the binary tree describing their evolution from the latter. State-of-the-art methods for the task, namely Maximum likelihood and Bayesian inference, have a high computational cost, which limits their usability on large datasets. Recently researchers have begun investigating deep learning approaches to the problem but so far these attempts have been limited to the reconstruction of quartet tree topologies, addressing phylogenetic reconstruction as a classification problem. We present here a radically different approach with a transformer-based network architecture that, given a multiple sequence alignment, predicts all the pairwise evolutionary distances between the sequences, which in turn allow us to accurately reconstruct the tree topology with standard distance-based algorithms. The architecture and its high degree of parameter sharing allow us to apply the same network to alignments of arbitrary size, both in the number of sequences and in their length. We evaluate our network Phyloformer on two types of simulations and find that its accuracy matches that of a Maximum Likelihood method on datasets that resemble training data, while being significantly faster.
Complete list of metadata

https://hal-cnrs.archives-ouvertes.fr/hal-03756990
Contributor : Laurent Jacob Connect in order to contact the contributor
Submitted on : Monday, August 22, 2022 - 2:05:43 PM
Last modification on : Saturday, September 24, 2022 - 2:36:04 PM

Links full text

Identifiers

Collections

Citation

Luca Nesterenko, Bastien Boussau, Laurent Jacob. Phyloformer: towards fast and accurate phylogeny estimation with self-attention networks. 2022. ⟨hal-03756990⟩

Share

Metrics

Record views

8