adp.ingest.datatypes.escalate_datatypes

adp.ingest.datatypes.escalate_datatypes(transform_func, table_name: str, snapshot: DataFrame, type_escalation_mode: TypeEscalationModeEnum, can_rewrite_history=False) Tuple[DataFrame, DataFrame]

Apply the escalation patterns as defined by data_strict and data_loose.

Performs the following steps for datatype escalation:

  1. Get the columns which are both in the new snapshot and in the bronze/silver df. We call this the “common” columns.

  2. Calculate escalation target datatype using data_strict or data_loose for each common column. (Depending on the type_escalation_mode)

  3. Cast new snapshot and/or delta table to the escalation target. (Do not apply the cast yet - spark is lazy!)

  4. Rewrite history (if applicable)

  5. Run the transform_func. This is append for bronze, and merge for silver

Parameters:
  • delta_path (str) – The path to the bronze/silver table in the datalake, for example: abfss://sdp@nubulosdpdlsENV01.dfs.core.windows.net/unit_tests/datatypes/all_types.delta

  • snapshot (DataFrame) – The new dataset to be appended/merged with the current bronze/silver table

  • type_escalation_mode (TypeEscalationModeEnum) – What kind of escalation mode should we use

  • can_rewrite_history (bool, optional) – Is it allowed to rewrite bronze/silver? Defaults to False as tihs screws with the table_changes functionality.

Raises:

CannotModifyDataTypeForHistory – Raises when can_rewrite_history is False and we have to rewrite history to make the two DataFrames compatible.

Returns:

Converted snapshot dataframe, converted bronze/silver dataframe

Return type:

(DataFrame, DataFrame)