Efficient gradient computation for optimization of hyperparameters

Jingyan Xu; Frederic Noo

doi:10.1088/1361-6560/ac4442

Efficient gradient computation for optimization of hyperparameters

Jingyan Xu, Frederic Noo

School of Medicine

Research output: Contribution to journal › Article › peer-review

Abstract

We are interested in learning the hyperparameters in a convex objective function in a supervised setting. The complex relationship between the input data to the convex problem and the desirable hyperparameters can be modeled by a neural network; the hyperparameters and the data then drive the convex minimization problem, whose solution is then compared to training labels. In our previous work (Xu and Noo 2021 Phys. Med. Biol. 66 19NT01), we evaluated a prototype of this learning strategy in an optimization-based sinogram smoothing plus FBP reconstruction framework. A question arising in this setting is how to efficiently compute (backpropagate) the gradient from the solution of the optimization problem, to the hyperparameters to enable end-to-end training. In this work, we first develop general formulas for gradient backpropagation for a subset of convex problems, namely the proximal mapping. To illustrate the value of the general formulas and to demonstrate how to use them, we consider the specific instance of 1D quadratic smoothing (denoising) whose solution admits a dynamic programming (DP) algorithm. The general formulas lead to another DP algorithm for exact computation of the gradient of the hyperparameters. Our numerical studies demonstrate a 55%-65% computation time savings by providing a custom gradient instead of relying on automatic differentiation in deep learning libraries. While our discussion focuses on 1D quadratic smoothing, our initial results (not presented) support the statement that the general formulas and the computational strategy apply equally well to TV or Huber smoothing problems on simple graphs whose solutions can be computed exactly via DP.

Original language	English (US)
Article number	03NT01
Journal	Physics in medicine and biology
Volume	67
Issue number	3
DOIs	https://doi.org/10.1088/1361-6560/ac4442
State	Published - Feb 7 2022

Keywords

automatic differentiation
dynamic programming
gradient backpropagation
hyperparameter learning
implicit differentiation
proximal mapping

ASJC Scopus subject areas

Radiological and Ultrasound Technology
Radiology Nuclear Medicine and imaging

Access to Document

10.1088/1361-6560/ac4442

Cite this

@article{34998ee9caee48e3b17f9d378597096b,

title = "Efficient gradient computation for optimization of hyperparameters",

abstract = "We are interested in learning the hyperparameters in a convex objective function in a supervised setting. The complex relationship between the input data to the convex problem and the desirable hyperparameters can be modeled by a neural network; the hyperparameters and the data then drive the convex minimization problem, whose solution is then compared to training labels. In our previous work (Xu and Noo 2021 Phys. Med. Biol. 66 19NT01), we evaluated a prototype of this learning strategy in an optimization-based sinogram smoothing plus FBP reconstruction framework. A question arising in this setting is how to efficiently compute (backpropagate) the gradient from the solution of the optimization problem, to the hyperparameters to enable end-to-end training. In this work, we first develop general formulas for gradient backpropagation for a subset of convex problems, namely the proximal mapping. To illustrate the value of the general formulas and to demonstrate how to use them, we consider the specific instance of 1D quadratic smoothing (denoising) whose solution admits a dynamic programming (DP) algorithm. The general formulas lead to another DP algorithm for exact computation of the gradient of the hyperparameters. Our numerical studies demonstrate a 55%-65% computation time savings by providing a custom gradient instead of relying on automatic differentiation in deep learning libraries. While our discussion focuses on 1D quadratic smoothing, our initial results (not presented) support the statement that the general formulas and the computational strategy apply equally well to TV or Huber smoothing problems on simple graphs whose solutions can be computed exactly via DP.",

keywords = "automatic differentiation, dynamic programming, gradient backpropagation, hyperparameter learning, implicit differentiation, proximal mapping",

author = "Jingyan Xu and Frederic Noo",

note = "Publisher Copyright: {\textcopyright} 2022 Institute of Physics and Engineering in Medicine.",

year = "2022",

month = feb,

day = "7",

doi = "10.1088/1361-6560/ac4442",

language = "English (US)",

volume = "67",

journal = "Physics in medicine and biology",

issn = "0031-9155",

publisher = "IOP Publishing Ltd.",

number = "3",

}

TY - JOUR

T1 - Efficient gradient computation for optimization of hyperparameters

AU - Xu, Jingyan

AU - Noo, Frederic

PY - 2022/2/7

Y1 - 2022/2/7

N2 - We are interested in learning the hyperparameters in a convex objective function in a supervised setting. The complex relationship between the input data to the convex problem and the desirable hyperparameters can be modeled by a neural network; the hyperparameters and the data then drive the convex minimization problem, whose solution is then compared to training labels. In our previous work (Xu and Noo 2021 Phys. Med. Biol. 66 19NT01), we evaluated a prototype of this learning strategy in an optimization-based sinogram smoothing plus FBP reconstruction framework. A question arising in this setting is how to efficiently compute (backpropagate) the gradient from the solution of the optimization problem, to the hyperparameters to enable end-to-end training. In this work, we first develop general formulas for gradient backpropagation for a subset of convex problems, namely the proximal mapping. To illustrate the value of the general formulas and to demonstrate how to use them, we consider the specific instance of 1D quadratic smoothing (denoising) whose solution admits a dynamic programming (DP) algorithm. The general formulas lead to another DP algorithm for exact computation of the gradient of the hyperparameters. Our numerical studies demonstrate a 55%-65% computation time savings by providing a custom gradient instead of relying on automatic differentiation in deep learning libraries. While our discussion focuses on 1D quadratic smoothing, our initial results (not presented) support the statement that the general formulas and the computational strategy apply equally well to TV or Huber smoothing problems on simple graphs whose solutions can be computed exactly via DP.

AB - We are interested in learning the hyperparameters in a convex objective function in a supervised setting. The complex relationship between the input data to the convex problem and the desirable hyperparameters can be modeled by a neural network; the hyperparameters and the data then drive the convex minimization problem, whose solution is then compared to training labels. In our previous work (Xu and Noo 2021 Phys. Med. Biol. 66 19NT01), we evaluated a prototype of this learning strategy in an optimization-based sinogram smoothing plus FBP reconstruction framework. A question arising in this setting is how to efficiently compute (backpropagate) the gradient from the solution of the optimization problem, to the hyperparameters to enable end-to-end training. In this work, we first develop general formulas for gradient backpropagation for a subset of convex problems, namely the proximal mapping. To illustrate the value of the general formulas and to demonstrate how to use them, we consider the specific instance of 1D quadratic smoothing (denoising) whose solution admits a dynamic programming (DP) algorithm. The general formulas lead to another DP algorithm for exact computation of the gradient of the hyperparameters. Our numerical studies demonstrate a 55%-65% computation time savings by providing a custom gradient instead of relying on automatic differentiation in deep learning libraries. While our discussion focuses on 1D quadratic smoothing, our initial results (not presented) support the statement that the general formulas and the computational strategy apply equally well to TV or Huber smoothing problems on simple graphs whose solutions can be computed exactly via DP.

KW - automatic differentiation

KW - dynamic programming

KW - gradient backpropagation

KW - hyperparameter learning

KW - implicit differentiation

KW - proximal mapping

UR - http://www.scopus.com/inward/record.url?scp=85125733180&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85125733180&partnerID=8YFLogxK

U2 - 10.1088/1361-6560/ac4442

DO - 10.1088/1361-6560/ac4442

M3 - Article

C2 - 34920440

AN - SCOPUS:85125733180

SN - 0031-9155

VL - 67

JO - Physics in medicine and biology

JF - Physics in medicine and biology

IS - 3

M1 - 03NT01

ER -

Efficient gradient computation for optimization of hyperparameters

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this