Skip to content

Getting Started with sparksnake

Installing from pip

The latest version of sparksnake library is already published in PyPI and available free of charge for anyone interested in improving the creation of their Spark applications using AWS services such as Glue and EMR. To start your journey, simply perform your installation using the following command:

# Installing sparksnake library
pip install sparksnake
About Python virtual environments

In general, it's a good practice to create a virtual environment before starting a Python project. This approach allows you to have an isolated environment with more refined control over dependencies.

Creating virtual environments

Creating Python virtual environments is easy. You can open your terminal or command prompt and run the following command in any directory of your preference.

python -m venv <venv_name>

Where <venv_name> should be replaced by the name of your brand new virtual environment. As an additional tip, you can have virtual environment names associated with project names (e.g. project_venv) for allowing you to remember the goal of each venv.

Accessing virtual environments

Creating a virtual env is just the first step of the process. After that, it's important to explicity access it to ensure that every action you perform (e.g. installing a new Python package) will be performed inside the virtual env.

If you use Windows as your OS, then use the command below to access the Python virtual environment:

# Accessing venvs on Windows
<venv_path>/Scripts/activate

In case you are using a Linux machine (or Git Bash in Windows), the command has minor changes and is given by:

# Accessing venvs on Linux
source <venv_path>/Scripts/activate

Where <venv_path> is the location reference of the newly created virtual environment. For example, if you created a virtual environment named test_venv in your user directory, then <venv_path> can be replaced by C:\Users\username\test_venv on Windows or simply ~/test_venv on Linux.

For more information, this excellent Real Python blog article may shed light on a number of questions involving the creation and use of Python virtual environments.

Start Building your Spark Application

Well, once you have sparksnake installed in your environment, you can use it wherever you need do create a Spark application script. The first task is to choose an operation mode.

What? What is an operation mode in sparksnake? Which modes are available?

No worries at this time, Spark developer! I prepared a really detailed demo section explaining all this for you.

Check the Library Structure page to start getting your hands dirty with sparksnake. I must advise you that this is a one-way street and you probably won't never want to develop Spark applications as you did until now.