In this third and final part of our “Apache Spark 3 playground” guide we will be adding a Zeppelin notebook into the mix.

Apache Zeppelin — from Zeppelin assets

Apache Zeppelin, from the Zepplin home page, is:

Web-based notebook that enables data-driven,
interactive data analytics and collaborative documents with SQL, Scala and more.

This will allow us to create a sandbox environment that will allow us to experiment and learn without needing to manage things like building, packaging, or deploying our code.

You can save these environments and share them with other developers, engineers, and analysts.

This guide will explain:


In the first part of our Apache Spark 3 playground guide, we covered setting up a Spark local environment that allowed you to start experimenting, and even run through the basic “Getting Started” tutorial.

If you haven’t done Part 1 yet, I suggest you start there as I’ll assume you’ve already gone through it and are familiar with the stages leading up to this guide, and are ready to dive in.

By Charles Deluvio on Unsplash

First, let’s talk a bit about what S3 is and why we’re interested in mocking it in the first place.

What’s the ‘S’ about?

From the AWS S3 documentation:

Amazon Simple Storage Service (Amazon S3) is an object…


Apache Spark has become one of the industry standard tools when it comes to slicing and dicing data.
Nowadays there are many other alternatives, but Spark is still widely used and, I believe, will be for a while.

See the docs !

This series of guides will take you through setting up a local playground to experiment with.
I’ll try to explain concepts I thought were interesting, while you setup a basic working environment.
In these guides we will cover the following:

Omri Keefe

Data Guild Manager at Dynamic Yield

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store