A simple showcase of how to use SFTP together with Spark. By using Spark I can read the files directly into a Spark RDD. Reading big files was failing for me when I was using plain Python with
In this article I will create an abstract class and different concrete classes to be used within AWS Lambda deployed with Terraform.
# Find the columns where each value is null empty_cols = [col for col in df.columns if df[col].isnull().all()] # Drop these columns from the dataframe df.drop(empty_cols, axis=1, inplace=True)
To setup one of my data projects, I need (object) storage to save my data. Using Spark I want to be able to read and write Parquet, CSV and other file formats.
My first experiment with Ansible to automate the provisioning of my server.
A simple Terraform deployment of a Lambda function that exports a Looker view to S3.
In this notebook I interact with AWS Glue using
In this notebook I create a date range with a precision of days and a date range with a precision of a month using
timedelta. This has helped me for automating filtering tasks, where I had to query data each day for a certain period and write te results to timestamped files.
My attempt to interact with Parquet files on Azure Blob Storage. Reading and writing Pandas dataframes is straightforward, but only the reading part is working with Spark 2.4.0.