Calculate differences with sparse date/value dataframes

Goal

  • Find the difference between two dataframes with a date and a value column
  • Create a continuous time axis to eliminate gaps in the date column
  • Fill the interpolated with the value of the previous date
In [1]:
import pandas as pd

Create a dataframe with two entries:

In [2]:
df_old = pd.DataFrame(
    [
        {
            'date': '2019-01-01',
            'value': 0.844
        },{
            'date': '2019-01-02',
            'value': 0.842
        }
    ]
)

Create a second dataframe with additional values for three more days:

In [3]:
df_new = pd.DataFrame(
    [
        {
            'date': '2019-01-01',
            'value': 0.844
        },{
            'date': '2019-01-02',
            'value': 0.842
        },{
            'date': '2019-01-03',
            'value': 0.84
        },{
            'date': '2019-01-04',
            'value': 0.84
        },{
            'date': '2019-01-07',
            'value': 0.838
        }
    ]
)

Create a time axis between the minimum and maximum date found in the bigger dataframe:

In [4]:
time_index = pd.date_range(df_new['date'].min(), df_new['date'].max())

Set the index to the date column:

In [5]:
df_new.set_index('date', inplace=True)

Convert the index to a DatetimeIndex:

In [6]:
df_new.index = pd.DatetimeIndex(df_new.index)

Reindex the existing date index with the continuous time_index to interpolate missing dates:

In [7]:
df_new = df_new.reindex(time_index, method='ffill')

Convert the date index back to a column:

In [8]:
df_new = df_new.reset_index().rename(columns={'index': 'date'})

Convert the date column to the proper datatype for the smaller dataframe:

In [9]:
df_old['date'] = pd.to_datetime(df_old['date'])

Combine the two dataframes and remove all the duplicate couples to only keep the none duplicate rows:

In [10]:
df_diff = pd.concat([df_new, df_old]).drop_duplicates(keep=False)
In [11]:
df_diff
Out[11]:
date value
2 2019-01-03 0.840
3 2019-01-04 0.840
4 2019-01-05 0.840
5 2019-01-06 0.840
6 2019-01-07 0.838