Distributed Processing of Large Satellite Images Using MapReduce

Session Type: 
Mr Ermias Tesfamariam, Institute for Geoinformatics

Advances in sensor technology and their ever increasing repositories of the collected data are revolutionizing the mechanisms remotely sensed data are collected, stored and processed.  This exponential growth of data archives and the increasing user’s demand for real-and near-real time remote sensing data products has pressurized remote sensing service providers to deliver the required services. The remote sensing community has recognized the challenge in processing large and complex satellite datasets to derive customized products. To address this high demand in computational resources, several efforts have been made in the past few years towards incorporation of high-performance computing models in remote sensing data collection, management and analysis. This study adds an impetus to these efforts by introducing the recent advancements in distributed computing technologies, MapReduce programming paradigm, to the area of remote sensing.

The MapReduce model which is developed by Google Inc. encapsulates the efforts of distributed computing in a highly simplified single library. This simple but powerful programming model can provide us distributed environment without having deep knowledge of parallel programming. This research presents a MapReduce based processing of large satellite images a use case scenario of edge detection methods. Deriving from the conceptual massive remote sensing image processing applications, a prototype of edge detection methods was implemented on MapReduce framework using its open-source implementation, the Apache Hadoop environment. The experiences of the implementation of the MapReduce model of Sobel, Laplacian, and Canny edge detection methods are presented. This thesis also presents the results of the evaluation the effect of parallelization using MapReduce on the quality of the output and the execution time performance tests conducted based on various performance metrics. The MapReduce algorithms were executed on a test environment on heterogeneous cluster that supports the Apache Hadoop open-source software. The successful implementation of the MapReduce algorithms on a distributed environment demonstrates that MapReduce has a great potential for scaling large-scale remotely sensed images processing and perform more complex geospatial problems.

Speaker Bio: 

MSc student at the Erasmus Mundus Master program in Geospatial Technologies of Joint consortium of University of Muenster, Universitat Jaume I, and Universidade Nuova de Lisboa.