JJ's World

Setting up Spark with minIO as object storage

Sun 30 June 2019

To setup one of my data projects, I need (object) storage to save my data. Using Spark I want to be able to read and write Parquet, CSV and other file formats.

Read more →
DevOps Ansible data engineer VPS Ubuntu Spark minIO object storage

Creating an Ansible playbook to provision my Ubuntu VPS

Wed 26 June 2019

My first experiment with Ansible to automate the provisioning of my server.

Read more →
DevOps Ansible data engineer VPS Ubuntu Spark Jupyter

Creating a Lambda function with Terraform to upload a Looker view

Mon 13 May 2019

A simple Terraform deployment of a Lambda function that exports a Looker view to S3.

Read more →
DevOps AWS data engineer Terraform Lambda IaC

Interacting with AWS Glue

Tue 02 April 2019

In this notebook I interact with AWS Glue using boto3.

Read more →
notebook AWS Python Jupyter Glue

Creating a data range with Python

Fri 08 March 2019

In this notebook I create a date range with a precision of days and a date range with a precision of a month using datetime with timedelta. This has helped me for automating filtering tasks, where I had to query data each day for a certain period and write te results to timestamped files.

Read more →
notebook Python Jupyter date generation daterange

Using Azure Blob Storage and Parquet

Tue 26 February 2019

My attempt to interact with Parquet files on Azure Blob Storage. Reading and writing Pandas dataframes is straightforward, but only the reading part is working with Spark 2.4.0.

Read more →
notebook Python Jupyter Spark Azure Blob. PySpark parquet data

Calculate differences with sparse date/value dataframes

Mon 07 January 2019

This notebook contains a small example that interpolates the values for a sparse dataframe and calculates the difference with a smaller dataframe.

Read more →
notebook Python Jupyter interpolation Pandas dataframes

Using Spark to read from S3

Fri 04 January 2019

A short example on how to interact with S3 from Pyspark.

Read more →
notebook Python Jupyter Spark pyspark AWS S3

Developing AWS Glue scripts on Mac OSX

Wed 21 November 2018

In this short tutorial I show how I developed my first Glue scripts for the AWS platform.

Read more →
DevOps data architecture Glue AWS data engineer Spark Zeppelin notebook

AWS Lambda development - Python & SAM

Tue 23 October 2018

This tutorial explains how to write a lambda functions in Python, test it locally, deploy it to AWS and test it in the cloud using Amazon's SAM. The README.md inside the cookiecutter template folder is used as the base of this tutorial.

Read more →
DevOps data architecture serverless AWS data engineer Lambda SAM
← Older
Newer →

I am lead data engineer with over 15 years of working with data. I have a passion for the field of machine learning, pattern recognition, big data, blockchain and ubiquitous computing.

While I mainly work in Python, I try to experiment with different languages and frameworks when I can. Lately I have been experimenting with AWS and Terraform since apart from data skills I want to stay on top of new developments within DevOps.

I am using this page as a portfolio and showcase, cheatsheet but mainly a historical record. That is why you will mainly find shell commands, short scripts or notebooks just for myself to not reinvent the wheel.


  • blockchain
  • Ubuntu
  • dataframe
  • Jupyter
  • MongoDB
  • DevOps
  • shell
  • API
  • notebook
  • VueJS
  • testing
  • Ethereum
  • Spark
  • postgres
  • javascript
  • Pandas
  • Docker
  • data engineer
  • AWS
  • Python
  • S3
  • Flask
  • PySpark

© JJ's World | Powered by Pelican | Hosted on Cloudflare Pages | 2008 - 2022