Integrating PySpark notebook with S3

In my post Using Spark to read from S3 I explained how I was able to connect Spark to AWS S3 on a Ubuntu machine. Last week I was trying to connect to S3 again using Spark on my local machine, but I wasn't able to read data from our datalake. Our datalake is hosted in the eu-west-2 region which apparently requires you to specify the version of authentication. Instead of setting up the right environment on my machine and reconfigure everything, I chose to update the Docker image from my notebook repo so I could test on my Mac before pushing it to my server. Instead of configuring both my local and remote environment I can simply spin up the Docker container and have two identical environments.

Run methods dynamically by name

In this project I want to verify the availability of the APIs that we use to ingest data into our data platform. In the example I will use Jira, Workable and HiBob, since they offer clean APIs without too much configuration. First I will create a test suite to verify the availability and once this works move it to a Lambda function that could be scheduled with CloudWatch on a fixed schedule.

Using Python with Jinja and PDFkit to generate a resume

This project contains a simple example on how to build a resume with Python using Jinja, HTML, Bootstrap and a data file. In the past I have always created my resume with Latex, but to make life a little easier I chose to switch to a combination of Python and HTML. Maintaining a Latex document is cumbersome and it is difficult to divide the data from the style. By using Jinja it is straightforward to separate the resume data from the actual layout. And the most important part, I can stick to Python!