- Getting Around
- About FOSS4G
Scalable Local Regression for Spatial Analytics
Dr Christian Kaiser, National University of Ireland Maynooth
Mr Fergal Walsh
Future infrastructures and the emerging internet of things will result in ubiquity of sensing technologies for various modalities of social and natural systems. One can easily imagine a scenario where a geo-referenced attribute of the environment is streamed back from handheld devices of potentially millions of users. Currently there is a need for scalable spatial statistics models to deal with such data streams.
Geographically weighted regression (GWR) is a popular approach to modelling spatially varying relationships. It is a form of local linear regression model where locality is assessed by closeness of the regression point to calibration data samples in geographical space. GWR provides local parameter estimates for each location using a diagonal spatial weights matrix computed with a geographical distance-decay kernel. With a well-developed statistical inference framework available for a variety of data models and its exploratory power to analyse spatial relationships this technique gained considerable popularity in quantitative geographical data analysis.
In order to use GWR for processing high volume data streams, several steps need to be undertaken: (1) the temporal dimension has to be incorporated, (2) fixed processing time and memory for every incoming sample irrespective of the order of samples in a stream has to be satisfied, and (3) the implementation has to be adapted to scale well with the dataset’s size. The first two problems are approached by introducing a time-decay weighting with a possibility to discard old or irrelevant samples in a stream. The last problem is encountered when time and memory usage is intractable for a single machine. The map-reduce paradigm provides a convenient framework for robust implementation of distributed computation and there are several ways for supporting map-reduce with GWR. A first possibility is to perform the local regression estimate at each regression point in parallel. For bigger sample sizes, it is possible to split the samples into subsets at a map step, perform distributed parameter estimation and combine the result in a reduce step.
This paper explains the methodology and the implementation details for GWR in a map-reduce framework with a focus on streaming data. The possibilities and limitations of the method are explored in a realistic application involving a house price data set of 85,000 samples in central Europe over a time period of several months with a time step of 1 day. The implementation is made available as open-source software.
Christian is a postdoc research fellow in geocomputation and co-developer of the open-source project i2maps. He is an expert in spatial analysis algorithms and dynamic visualisation.