Strange Loop 2014: Turning the Database Inside Out with Apache Samza

 Peabody Opera House

 September 18, 2014:  2:00 PM – 2:40 PM


From the promotional materials:

Turning the Database Inside Out with Apache Samza

Databases are global, shared, mutable state. That’s the way it has been since the 1960s, and no amount of NoSQL has changed that. However, most self-respecting developers have got rid of mutable global variables in their code long ago. So why do we tolerate databases as they are?

A more promising model, used in some systems, is to think of a database as an always-growing collection of immutable facts. You can query it at some point in time — but that’s still old, imperative style thinking. A more fruitful approach is to take the streams of facts as they come in, and functionally process them in real-time.

This talk introduces Apache Samza, a distributed stream processing framework developed at LinkedIn. At first it looks like yet another tool for computing real-time analytics, but it’s more than that. Really it’s a surreptitious attempt to take the database architecture we know, and turn it inside out.

At its core is a distributed, durable commit log, implemented by Apache Kafka. Layered on top are simple but powerful tools for joining streams and managing large amounts of data reliably.

What we have to gain from turning the database inside out? Simpler code, better scalability, better robustness, lower latency, and more flexibility for doing interesting things with data. After this talk, you’ll see the architecture of your own applications in a completely new light.

Martin Kleppmann is a committer on Apache Samza (a distributed stream processing framework), software engineer at LinkedIn, and author at O’Reilly (currently writing a book on designing data-intensive applications). He invented the infamous “LinkedIn Intro” email proxy. Previously he co-founded and sold two startups, Rapportive and Go Test It. He is based in Cambridge, UK.

My personal notes:

If you take a look at my original Strange Loop 2014 post on session planning for the event, the timeslot of this event had no competitors. In retrospect, while this session focused on the database and interfacing with databases, discussion topics involved both database architecture and application architecture, which touch upon much of what I do as a consultant on client projects.

This post is for subscribers only

Already have an account? Sign in.

Subscribe to Erik on Software

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe