adp.ingest.transform.column_aware_difference
- adp.ingest.transform.column_aware_difference(snapshot: DataFrame, df_base: DataFrame) DataFrame
column_aware_difference returns the rows in snapshot that are not in df_base
The merge operation in the silver layer is an expensive operation. This function returns only those rows in the snapshot that are not available in df_base. It also re-orders the columns to match those in df_base. This makes sure the proceding merge statements will execute successfully.
- Parameters:
snapshot (DataFrame) – The (new) dataframe to merge in the silver layer
df_base (DataFrame) – The current data in the silver layer
- Returns:
The column aware difference between snapshot and df_base.
- Return type:
DataFrame