Dean Wampler, Ph.D., is a member of the Office of the CTO and the Architect for Big Data Products and Services at Typesafe. He uses Scala and Functional Programming to build Big Data systems using Spark, Mesos, Hadoop, the Typesafe Reactive Platform, and other tools. Dean is the author or co-author of three O’Reilly books on Scala, Functional Programming, and Hive. He contributes to several open source projects (including Spark) and he co-organizes and speaks at many technology conferences and Chicago-based user groups.
YOW! Lambda Jam 2014 Brisbane
Reactive Design & Language Paradigms
Can reactive designs be implemented in any programming language? Or, are some languages and programming paradigms better for building reactive systems? How do traditional design approaches, like Object-Oriented Design (OOD) and Domain-Driven Design (DDD), apply to reactive applications. The Reactive Manifesto strikes a balance between specifying the essential features for reactive systems and allowing implementation variations appropriate for each language and execution environment. We’ll compare and contrast different techniques, like Reactive Streams, callbacks, Actors, Futures, and Functional Reactive Programming (FRP), and we’ll see examples of how they are realized in various languages and toolkits. We’ll understand their relative strengths and weaknesses, their similarities and differences, from which we’ll draw lessons for building reactive applications more effectively.
Big Data with Spark
YOW! DEPTHFIRST WORKSHOP
WHAT WILL YOU LEARN
This hands-on workshop will introduce you to writing Spark applications to solve real-world Big Data problems. We’ll also learn how to use Shark, GraphX, and MLI, and discuss a few other tools in the ecosystem.
WHO SHOULD ATTEND
Software developers, architects, data analysts, architects, database analyst, technical leaders and anybody with an interest in big data and/or functional programming.
Spark is a Scala-based distributed computation environment for “Big Data” that is emerging as a replacement for Hadoop MapReduce, because Spark offers significantly better performance, greater flexibility for implementing algorithms, and the power of functional programming combinators, all while interoperating with other Hadoop tools, such as HDFS (Hadoop Distributed File System). Spark applications can be written in Scala, Java, Python, and soon, R. A number of specialized libraries in the Spark ecosystem are built on this foundation, including a SQL query tool for flat-file data called Shark, a graph processing system called GraphX, and a machine learning library called MLI, among others.
But the MapReduce computing model is hard to use. It’s very course-grained and relatively inflexible. Translating many otherwise intuitive algorithms to MapReduce requires specialized expertise. The industry is already starting to look elsewhere…
However, the very name MapReduce tells us its roots, the core concepts of mapping and reducing familiar from Functional Programming (FP). We’ll discuss how to return MapReduce and Copious Data, in general, to its ideal place, rooted in FP. We’ll discuss the core operations (“combinators”) of FP that meet our requirements, finding the right granularity for modularity, myths of mutability and performance, and trends that are already moving us in the right direction. We’ll see why the dominance of Java in Hadoop is harming progress. You might think that concurrency is the “killer app” for FP and maybe you’re right. I’ll argue that Copious Data is just as important for driving FP into the mainstream. Actually, FP has a long tradition in data systems, but we’ve been calling it SQL…
The world of Copious Data (permit me to avoid the overexposed term Big Data) is currently dominated by Apache Hadoop, a clean-room version of the MapReduce computing model and a distributed, (mostly) reliable file system invented at Google.