Notes on Kubernetes

After numerous restarts and resets of my VPS and executing over 20 different tutorials, I finally have my receipe to get my single node Kubernetes cluster with a proper load balancer working. Please note that the last version of my Kubernetes experiment can be found on my Git repo. Remember that this repository mainly contains a variety of YAML files that got the system working. What worked for me, might not work for anyone else.

Using executemany to increase PyODBC connection

I recently had to insert data from a Pandas dataframe into a Azure SQL database using pandas.to_sql(). This was performing very poorly and seemed to take ages, but since PyODBC introduced executemany it is easy to improve the performance: simply add an event listener that activates the executemany for the cursor. For 2300 records I did a small comparison 8.67s and 7.22s versus 5min 57s and 5min 26s, so roughly 50 times faster for this small example dataset.

import pandas as pd
import pyodbc
from sqlalchemy import create_engine, event
from sqlalchemy.pool import StaticPool

wh_conn = pyodbc.connect(f"DRIVER={config[name]['driver']};SERVER={config[name]['server']},{config[name]['port']};DATABASE={config[name]['database']};UID={config[name]['username']};PWD={config[name]['password']}")
engine = create_engine("mssql+pyodbc://", poolclass=StaticPool, creator=lambda: wh_conn)

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True

df.to_sql(name='Table',
          con=engine,
          schema='Schema',
          index=False,
          if_exists='replace')

Using ButterCMS in a Jupyter notebook with VueJS

The goal of this notebook is to get myself familiar with ButterCMS, a headless CMS that can be easily integrated with any platform in any language. In the past I've experienced with Joomla, Drupal and WordPress (and Blogger long time ago), where I have lost invested a lot of time in simply getting the CMS in place, extending the CMS with more functionality and migrating data between servers. Since I am investigating different ways of using microservices, becoming less platform dependent and overall learning new tools, I use this notebook to show a simple example of ButterCMS in Jupyter.

Creating Pandas dataframe from Azure Table Storage

import pandas as pd
from azure.cosmosdb.table.tableservice import TableService

CONNECTION_STRING = "DUMMYSTRING"
SOURCE_TABLE = "DUMMYTABLE"

def set_table_service():
    """ Set the Azure Table Storage service """
    return TableService(connection_string=CONNECTION_STRING)

def get_dataframe_from_table_storage_table(table_service, filter_query):
    """ Create a dataframe from table storage data """
    return pd.DataFrame(get_data_from_table_storage_table(table_service,
                                                          filter_query))

def get_data_from_table_storage_table(table_service, filter_query):
    """ Retrieve data from Table Storage """
    for record in table_service.query_entities(
        SOURCE_TABLE, filter=filter_query
    ):
        yield record

fq = "PartitionKey eq '12345'"
ts = set_table_service()
df = get_data_dataframe_from_table_storage_table(table_service=ts,
                                                 filter_query=fq)

Create Spark dataframe column with lag

Create a lagged column in a PySpark dataframe:

from pyspark.sql.functions import monotonically_increasing_id, lag
from pyspark.sql.window import Window

# Add ID to be used by the window function
df = df.withColumn('id', monotonically_increasing_id())
# Set the window
w = Window.orderBy("id")
# Create the lagged value
value_lag = lag('value').over(w)
# Add the lagged values to a new column
df = df.withColumn('prev_value', value_lag)