TY - JOUR
T1 - Neural Networks for Geospatial Data
AU - Zhan, Wentao
AU - Datta, Abhirup
N1 - Publisher Copyright:
© 2024 American Statistical Association.
PY - 2024
Y1 - 2024
N2 - Analysis of geospatial data has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a Gaussian process covariance model, encoding the spatial dependence. While nonlinear machine learning algorithms like neural networks are increasingly being used for spatial analysis, current approaches depart from the model-based setup and cannot explicitly incorporate spatial covariance. We propose NN-GLS, embedding neural networks directly within the traditional Gaussian process (GP) geostatistical model to accommodate nonlinear mean functions while retaining all other advantages of GP, like explicit modeling of the spatial covariance and predicting at new locations via kriging. In NN-GLS, estimation of the neural network parameters for the nonlinear mean of the Gaussian Process explicitly accounts for the spatial covariance through use of the generalized least squares (GLS) loss, thus, extending the linear case. We show that NN-GLS admits a representation as a special type of graph neural network (GNN). This connection facilitates the use of standard neural network computational techniques for irregular geospatial data, enabling novel and scalable mini-batching, backpropagation, and kriging schemes. We provide methodology to obtain uncertainty bounds for estimation and predictions from NN-GLS. Theoretically, we show that NN-GLS will be consistent for irregularly observed spatially correlated data processes. We also provide a finite sample concentration rate, which quantifies the need to accurately model the spatial covariance in neural networks for dependent data. To our knowledge, these are the first large-sample results for any neural network algorithm for irregular spatial data. We demonstrate the methodology through numerous simulations and an application to air pollution modeling. We develop a software implementation of NN-GLS in the Python package geospaNN. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
AB - Analysis of geospatial data has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a Gaussian process covariance model, encoding the spatial dependence. While nonlinear machine learning algorithms like neural networks are increasingly being used for spatial analysis, current approaches depart from the model-based setup and cannot explicitly incorporate spatial covariance. We propose NN-GLS, embedding neural networks directly within the traditional Gaussian process (GP) geostatistical model to accommodate nonlinear mean functions while retaining all other advantages of GP, like explicit modeling of the spatial covariance and predicting at new locations via kriging. In NN-GLS, estimation of the neural network parameters for the nonlinear mean of the Gaussian Process explicitly accounts for the spatial covariance through use of the generalized least squares (GLS) loss, thus, extending the linear case. We show that NN-GLS admits a representation as a special type of graph neural network (GNN). This connection facilitates the use of standard neural network computational techniques for irregular geospatial data, enabling novel and scalable mini-batching, backpropagation, and kriging schemes. We provide methodology to obtain uncertainty bounds for estimation and predictions from NN-GLS. Theoretically, we show that NN-GLS will be consistent for irregularly observed spatially correlated data processes. We also provide a finite sample concentration rate, which quantifies the need to accurately model the spatial covariance in neural networks for dependent data. To our knowledge, these are the first large-sample results for any neural network algorithm for irregular spatial data. We demonstrate the methodology through numerous simulations and an application to air pollution modeling. We develop a software implementation of NN-GLS in the Python package geospaNN. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
KW - Consistency
KW - Gaussian process
KW - Geostatistics
KW - Graph neural networks
KW - Kriging
KW - Machine learning
KW - Neural networks
UR - http://www.scopus.com/inward/record.url?scp=85196753623&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85196753623&partnerID=8YFLogxK
U2 - 10.1080/01621459.2024.2356293
DO - 10.1080/01621459.2024.2356293
M3 - Article
AN - SCOPUS:85196753623
SN - 0162-1459
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
ER -