Josep Sampé
Universitat Rovira i Virgili
Pedro Garcia Lopez
Universitat Rovira i Virgili

Serverless Big Data Analytics with Lithops

Presentation: PDF

This demo introduces Lithops, a multi-cloud computing framework for big data analytics and embarrassingly parallel jobs. Lithops is part of the Project, and it allows developers to execute regular python code at massive scale in the cloud, using serverless computing resources. Thanks to serverless computing, Lithops makes easy the task of spawning hundreds or thousands of cloud functions to execute a large job in a few seconds from start, without waiting for a cluster to be up and running in the cloud. In this demonstration of Lithops, we visualize the transparent process of computation offloading, starting from the development of an application in a local machine until the actual parallel execution of functions in the Cloud. While existing frameworks have high setup costs, Lithops emphasizes the ease of use and a seamless integration of various Serverless platforms from different cloud providers.

In this demonstration, we focus on showing how Lithops can run Big Data analytics jobs on large volumes of data stored in a cloud object store, thanks to its built-in data discovery and partitioning module. This module allows Lithops to analyze the datasets stored in an object store and launch the appropriate number of functions thanks to an on-the-fly data partitioner. In particular, we will make use of the Sentinel-2 Geospatial dataset. The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery of the Earth’s land surface every 5 days. Finally, we will demonstrate how Lithops can support the chunking of different data formats, describing its advantages and constraints with live examples.