TY - JOUR

T1 - Gradients do grow on trees

T2 - A linear-time O(N)-dimensional gradient for statistical phylogenetics

AU - Ji, Xiang

AU - Zhang, Zhenyu

AU - Holbrook, Andrew

AU - Nishimura, Akihiko

AU - Baele, Guy

AU - Rambaut, Andrew

AU - Lemey, Philippe

AU - Suchard, Marc A.

N1 - Funding Information:
We thank Jeffrey Thorne for thoughtful comments. The research leading to these results has received funding from the European Research Council under the European Union's Horizon 2020 research and innovation program (Grant Agreement No. 725422-ReservoirDOCS). The Artic Network receives funding from the Wellcome Trust through project 206298/Z/17/Z.M.A.S. and X.J. are partially supported by NSF Grant DMS 1264153 and NIH Grants R01 AI107034 and U19 AI135995. A.H. acknowledgeds support by NIHNIAID grant K25AI153816. G.B. acknowledges support from the Interne Fondsen KU Leuven/Internal Funds KU Leuven under grant agreement C14/18/094. P.L. acknowledges support by the Research Foundation-Flanders ("Fonds voor Wetenschappelijk Onderzoek-Vlaanderen," G066215N, G0D5117N, and G0B9317N).
Publisher Copyright:
© 2020 The Author(s).

PY - 2020/10/1

Y1 - 2020/10/1

N2 - Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, wheres N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithmsignificantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.

AB - Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, wheres N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithmsignificantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.

KW - Bayesian inference

KW - Linear-time gradient algorithm

KW - Maximum likelihood

KW - Random-effects molecular clock model

UR - http://www.scopus.com/inward/record.url?scp=85092444332&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85092444332&partnerID=8YFLogxK

U2 - 10.1093/molbev/msaa130

DO - 10.1093/molbev/msaa130

M3 - Article

C2 - 32458974

AN - SCOPUS:85092444332

SN - 0737-4038

VL - 37

SP - 3047

EP - 3060

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

IS - 10

ER -