Beam and Flink on Kubernetes, BIP-1, and Relational-Beam

April 8th, 2020

Date: April 8th, 2020 Time: 6:00pm - 8:30pm Location: Lyft HQ @ 185 Berry Street, San Francisco - Food and Drinks sponsored by Lyft and Google

Tickets

Additional Information

Talk #1: Managing Flink and Beam applications on Kubernetes At Lyft, we use Apache Flink and Apache Beam to power a variety of real-time streaming use cases that include deriving pickup ETA accuracy, dynamic pricing, generating features for machine learning models for fraud detection, among others.

In the past, we relied on bare EC2 instances to provision and run Flink clusters. To achieve greater reliability, we embarked on a year-long journey to rebuild our streaming platform on top of Kubernetes. This talk will cover the motivation behind re-architecting our platform, how we built an open-source Kubernetes operator to manage Flink and Beam applications, and how we designed the user experience for the newly built platform. The talk will also highlight the challenges of running stateful applications on Kubernetes and the lessons we learnt along the way.

Speaker Bio: Lakshmi Gururaja Rao is a software engineer on the streaming platform team at Lyft. The team builds and supports the core infrastructure that enables several product teams at Lyft to easily and reliably spin up Flink and Beam pipelines to perform aggregations on real-time data. Most recently, she worked on re-architecting the platform to a Kubernetes based deployment and building the Flink Kubernetes operator

Talk #2: Relational Processing in Apache Beam: Past, Present, Future When it started, Apache Beam had no concept of relational (SQL-style) processing. Over time, elements of relational processing have been added to the Java SDK - SQL support, and then schemas, and then relational transforms - all now make up a powerful ecosystem of relational processing tools available to Beam Java users. Now there is interest in taking this concept further: making it part of the Beam model, and bringing it to more SDKs. In this talk I will review how relational processing has developed in Apache Beam, I will show you what users can do with these features today, and I will discuss the exciting future made possible by integrating relational processing into the Beam model.

Speaker bio: Brian Hulette is a software engineer at Google and an Apache Beam committer focusing on making Beam schemas portable. Prior to joining Google, he worked on a wide array of projects, ranging from distributed software-defined radio systems to high-performance data visualization tools built with Apache Arrow in Javascript. He occasionally writes short things on twitter @BrianHulette and longer things on http://theneuralbit.com.

Talk #3: Beam Schema Options To provide a high customer satisfaction, Workday leverages the operational data (both structured and unstructured data) to optimize services. At Workday, a centralized big data platform is used for descriptive, diagnostic and predictive analytics for all personas. This talk will introduce the challenges that the platform needs to address. Afterwards, we will present the tools used for various analysis tasks. Moreover, an in-house data pipeline platform will be presented, which is used for automation.

Speaker bio: Alex Van Boxel is an Apache Beam committer. During his day-job he is a Big Data and Cloud Architect and Veepee.com.