TY - CONF
T1 - Analyzing and exploiting NARX recurrent neural networks for long-term dependencies
AU - DiPietro, Robert
AU - Rupprecht, Christian
AU - Navab, Nassir
AU - Hager, Gregory D.
N1 - Funding Information:
This work was supported by the Technische Universität München – Institute for Advanced Study, funded by the German Excellence Initiative and the European Union Seventh Framework Pro- gramme under grant agreement 291763, and by the National Institutes of Health, grant R01-DE025265.
Publisher Copyright:
© 6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Recurrent neural networks (RNNs) have achieved state-of-the-art performance on many diverse tasks, from machine translation to surgical activity recognition, yet training RNNs to capture long-term dependencies remains difficult. To date, the vast majority of successful RNN architectures alleviate this problem using nearly-additive connections between states, as introduced by long short-term memory (LSTM). We take an orthogonal approach and introduce MIST RNNs, a NARX RNN architecture that allows direct connections from the very distant past. We show that MIST RNNs 1) exhibit superior vanishing-gradient properties in comparison to LSTM and previously-proposed NARX RNNs; 2) are far more efficient than previously-proposed NARX RNN architectures, requiring even fewer computations than LSTM; and 3) improve performance substantially over LSTM and Clockwork RNNs on tasks requiring very long-term dependencies.
AB - Recurrent neural networks (RNNs) have achieved state-of-the-art performance on many diverse tasks, from machine translation to surgical activity recognition, yet training RNNs to capture long-term dependencies remains difficult. To date, the vast majority of successful RNN architectures alleviate this problem using nearly-additive connections between states, as introduced by long short-term memory (LSTM). We take an orthogonal approach and introduce MIST RNNs, a NARX RNN architecture that allows direct connections from the very distant past. We show that MIST RNNs 1) exhibit superior vanishing-gradient properties in comparison to LSTM and previously-proposed NARX RNNs; 2) are far more efficient than previously-proposed NARX RNN architectures, requiring even fewer computations than LSTM; and 3) improve performance substantially over LSTM and Clockwork RNNs on tasks requiring very long-term dependencies.
UR - http://www.scopus.com/inward/record.url?scp=85083950460&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083950460&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85083950460
T2 - 6th International Conference on Learning Representations, ICLR 2018
Y2 - 30 April 2018 through 3 May 2018
ER -