adp.ingest.transform.column_aware_difference

adp.ingest.transform.column_aware_difference(snapshot: DataFrame, df_base: DataFrame) DataFrame

column_aware_difference returns the rows in snapshot that are not in df_base

The merge operation in the silver layer is an expensive operation. This function returns only those rows in the snapshot that are not available in df_base. It also re-orders the columns to match those in df_base. This makes sure the proceding merge statements will execute successfully.

Parameters:
  • snapshot (DataFrame) – The (new) dataframe to merge in the silver layer

  • df_base (DataFrame) – The current data in the silver layer

Returns:

The column aware difference between snapshot and df_base.

Return type:

DataFrame