Using PySpark with S3 (Updated)
Installing Spark on your local machine is always a challenge. For future reference I wrote down the steps to make Spark 3.3.1 work on my MacBook.
Installing Spark on your local machine is always a challenge. For future reference I wrote down the steps to make Spark 3.3.1 work on my MacBook.
This notebook describes an experiment to get to know Great Expectations better. In the approach below we will use the core API rather than the configuration option for the expectations.
The idea for this project is to finally use Pythonista on my phone. Every time I hit the Share Menu I see the option to Run a Pythonista script but I never had the opportunity (or imagination) to actually use it. Since I was already building a Dictionary app as part of a VueJS/FastAPI/Postgres project to get familiar with FastAPI I thought it would be interesting to use my phone to add words to the database. In addition to adding words to the database the Pythonista script will also query a dictionary API to add the definition to the table as well.
With all the new ways of deploying your static websites without hosting your own servers I thought it was a good time for me to move away from Github Pages. For my Github hosted website I was using two repositories of which one is for the Pelican source data (Markdown and Jupyter notebooks) and one for the HTML output. Using one of these new services I do not need to worry about the output folder since the build of the HTML files will be done on the server and is not stored in your repository. I did a quick test with the famous Netlify which worked really well but it was slightly more difficult to setup a domain name. Because I am already using Cloudflare for my DNS it made sense to me to try out Cloudflare Pages to easily link my domain name and avoid having two repositories for one website.
The following script will use the jjmain
profile defined in ~/.aws/credentials
and retrieve a session token after a MFA token is provided. With the session token the admin-role
can be assumed and data from the AWS Cost Explorer is retrieved.
from datetime import datetime, timedelta
import boto3
ACCOUNT_NUMBER = 123456789
MAX_DURATION = 129600
NUM_DAYS = 30
USER = "jitse-jan"
# Prompt user for the MFA token
token = input("MFA token: ")
# Get the credentials to assume the role
session = boto3.session.Session(profile_name="jjmain")
sts_client = session.client("sts")
assumed_role_object = sts_client.get_session_token(
DurationSeconds=MAX_DURATION,
SerialNumber=f"arn:aws:iam::{ACCOUNT_NUMBER}:mfa/{USER}",
TokenCode=token,
)
credentials = assumed_role_object["Credentials"]
# Setup the Cost Explorer client
ce_client = session.client(
"ce",
aws_access_key_id=credentials["AccessKeyId"],
aws_secret_access_key=credentials["SecretAccessKey"],
aws_session_token=credentials["SessionToken"],
)
# Get the cost and usage for the provided time period
now = datetime.utcnow().now()
start = (now - timedelta(days=NUM_DAYS)).strftime("%Y-%m-%d")
end = now.strftime("%Y-%m-%d")
data = ce_client.get_cost_and_usage(
TimePeriod={"Start": start, "End": end},
Granularity="MONTHLY",
Metrics=["UnblendedCost"],
)
# Get the total unblended cost
total_cost = 0
for timeperiod in data["ResultsByTime"]:
total_cost += float((timeperiod["Total"]["UnblendedCost"]["Amount"]))
print(f"Total unblended cost in the past {NUM_DAYS} days is {total_cost:.2f} USD")
It has been quite a few years that I have been working with Python but I never took the time to take a deeper look into how to package my code. To be fair, I never really had the use case and the following tutorial is purely a simple introduction on how to create your package and deploy it to your private server.
The fourth tutorial in the Postgres related articles is an improvement on the Flask API I made last time. I will be using FastAPI to create the same API, but with the batteries included.
As a third tutorial in my Postgres related articles I am creating a basic REST API to retrieve data through Flask and SQLAlchemy from the Postgres database.
To continue the experiment with PostgreSQL I made a simple crawler that retrieves data with Scrapy and stores the data in the PostgreSQL database.
This tutorial describes how to set up PostgreSQL on Ubuntu and configure it to make it connect with Python.