TY - JOUR
T1 - Gradients do grow on trees
T2 - A linear-time O(N)-dimensional gradient for statistical phylogenetics
AU - Ji, Xiang
AU - Zhang, Zhenyu
AU - Holbrook, Andrew
AU - Nishimura, Akihiko
AU - Baele, Guy
AU - Rambaut, Andrew
AU - Lemey, Philippe
AU - Suchard, Marc A.
N1 - Funding Information:
We thank Jeffrey Thorne for thoughtful comments. The research leading to these results has received funding from the European Research Council under the European Union's Horizon 2020 research and innovation program (Grant Agreement No. 725422-ReservoirDOCS). The Artic Network receives funding from the Wellcome Trust through project 206298/Z/17/Z.M.A.S. and X.J. are partially supported by NSF Grant DMS 1264153 and NIH Grants R01 AI107034 and U19 AI135995. A.H. acknowledgeds support by NIHNIAID grant K25AI153816. G.B. acknowledges support from the Interne Fondsen KU Leuven/Internal Funds KU Leuven under grant agreement C14/18/094. P.L. acknowledges support by the Research Foundation-Flanders ("Fonds voor Wetenschappelijk Onderzoek-Vlaanderen," G066215N, G0D5117N, and G0B9317N).
Publisher Copyright:
© 2020 The Author(s).
PY - 2020/10/1
Y1 - 2020/10/1
N2 - Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, wheres N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithmsignificantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.
AB - Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, wheres N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithmsignificantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.
KW - Bayesian inference
KW - Linear-time gradient algorithm
KW - Maximum likelihood
KW - Random-effects molecular clock model
UR - http://www.scopus.com/inward/record.url?scp=85092444332&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092444332&partnerID=8YFLogxK
U2 - 10.1093/molbev/msaa130
DO - 10.1093/molbev/msaa130
M3 - Article
C2 - 32458974
AN - SCOPUS:85092444332
SN - 0737-4038
VL - 37
SP - 3047
EP - 3060
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 10
ER -