In this third and final part of our “Apache Spark 3 playground” guide we will be adding a Zeppelin notebook into the mix.
Web-based notebook that enables data-driven,
interactive data analytics and collaborative documents with SQL, Scala and more.
This will allow us to create a sandbox environment that will allow us to experiment and learn without needing to manage things like building, packaging, or deploying our code.
You can save these environments and share them with other developers, engineers, and analysts.
In the first part of our Apache Spark 3 playground guide, we covered setting up a Spark local environment that allowed you to start experimenting, and even run through the basic “Getting Started” tutorial.
If you haven’t done Part 1 yet, I suggest you start there as I’ll assume you’ve already gone through it and are familiar with the stages leading up to this guide, and are ready to dive in.
First, let’s talk a bit about what S3 is and why we’re interested in mocking it in the first place.
Amazon Simple Storage Service (Amazon S3) is an object…
Apache Spark has become one of the industry standard tools when it comes to slicing and dicing data.
Nowadays there are many other alternatives, but Spark is still widely used and, I believe, will be for a while.
This series of guides will take you through setting up a local playground to experiment with.
I’ll try to explain concepts I thought were interesting, while you setup a basic working environment.
In these guides we will cover the following:
Data Guild Manager at Dynamic Yield