Event-driven task automation of Java and Spark jobs with Mesos

Quickie

bigdata (Big) Data

Room 5

Thursday from 12:20 til 12:40

We live in a digital era where mobile communications play a major role in our daily lives. At Teralytics, the analysis of billions of telecom events around the globe is allowing us to derive unprecedented insights into human mobility behavior in cities and across cities. The acquisition, standardization, processing, and analysis of this data is carried out reliably and performantly using Mesos and Spark.

Retrieving data from multiple sources as batch files on a non-regular schedule we were challenged with creating a system that processes data as soon as it arrives and is fault tolerant. Thus we designed a custom Mesos framework to perform event-driven tasks in a distributed environment. The framework schedules Java and Spark tasks with variable input on arbitrary nodes using Mesos. This automated system supersedes previously manually run jobs and saves Teralytics time and cluster resources as it can be programmed to schedule jobs only during night times when data scientists do not need the cluster.

Florian Froese Florian Froese

Florian Froese is a Software Engineer in the Platform team at Teralytics. He joined Teralytics in 2013 after his Master’s degree in Computer Science from ETH Zurich. He has been developing multiple core products and APIs for Teralytics using Mesos, Scala and Spark.