adp.delivery.gold.append_unknown_record
- adp.delivery.gold.append_unknown_record(df: DataFrame, primary_key_columns: list[str] | str, end_date_column: str | None = None) DataFrame
append_unknown_record to dataframe
Appends an unknown record to a dataframe. Comes in handy when creating dimension tables.
Example
>>> schema = StructType([ StructField('test_id',IntegerType(),nullable=False), StructField('test_string_required',StringType(),nullable=False), StructField('test_string_optional',StringType(),nullable=True), StructField('test_string_excluded',StringType(),nullable=True), StructField('test_integer',IntegerType(),nullable=True), StructField('test_decimal',DecimalType(5,2),nullable=True), StructField('test_double', DoubleType()), StructField('test_float', FloatType()), StructField('test_date',DateType(),nullable=True), StructField('test_timestamp',TimestampType(),nullable=True) ]) >>> data = [ (1, 'test_string_required_1', 'test_string_optional_1', 'test_string_excluded_1', 123, Decimal(999.99), 9.99, 9.99, date(2000,1,1), datetime.now()), (2, 'test_string_required_2', 'test_string_optional_2', 'test_string_excluded_2', 123, Decimal(999.99), 9.99, 9.99, date(2000,1,1), datetime.now()), (3, 'test_string_required_3', 'test_string_optional_3', 'test_string_excluded_3', 123, Decimal(999.99), 9.99, 9.99, date(2000,1,1), datetime.now())] >>> df_in = create_sparksession().createDataFrame(data,schema) >>> df_in.show() +-------+--------------------+--------------------+--------------------+------------+------------+-----------+----------+----------+--------------------+ |test_id|test_string_required|test_string_optional|test_string_excluded|test_integer|test_decimal|test_double|test_float| test_date| test_timestamp| +-------+--------------------+--------------------+--------------------+------------+------------+-----------+----------+----------+--------------------+ | 1|test_string_requi...|test_string_optio...|test_string_exclu...| 123| 999.99| 9.99| 9.99|2000-01-01|2022-07-19 15:08:...| | 2|test_string_requi...|test_string_optio...|test_string_exclu...| 123| 999.99| 9.99| 9.99|2000-01-01|2022-07-19 15:08:...| | 3|test_string_requi...|test_string_optio...|test_string_exclu...| 123| 999.99| 9.99| 9.99|2000-01-01|2022-07-19 15:08:...| +-------+--------------------+--------------------+--------------------+------------+------------+-----------+----------+----------+--------------------+ >>> df_out = append_unknown_record(df_in, 'test_id', 'test_date') >>> df_out.show() +-------+--------------------+--------------------+--------------------+------------+------------+-----------+----------+----------+--------------------+ |test_id|test_string_required|test_string_optional|test_string_excluded|test_integer|test_decimal|test_double|test_float| test_date| test_timestamp| +-------+--------------------+--------------------+--------------------+------------+------------+-----------+----------+----------+--------------------+ | 1|test_string_requi...|test_string_optio...|test_string_exclu...| 123| 999.99| 9.99| 9.99|2000-01-01|2022-07-19 15:08:...| | 2|test_string_requi...|test_string_optio...|test_string_exclu...| 123| 999.99| 9.99| 9.99|2000-01-01|2022-07-19 15:08:...| | 3|test_string_requi...|test_string_optio...|test_string_exclu...| 123| 999.99| 9.99| 9.99|2000-01-01|2022-07-19 15:08:...| | -1| Onbekend| Onbekend| Onbekend| 1| 1.00| 1.0| 1.0|3000-12-31|3000-12-31 00:00:00.| +-------+--------------------+--------------------+--------------------+------------+------------+-----------+----------+----------+--------------------+
- Parameters:
df (DataFrame) – The dataframe to add the unknown record to
primary_key_columns (str | List[str]) – Column name(s) of the primary key.
end_date_column (str, optional) – Column where the date should be replaced to a date far in the future. Defaults to None.
- Returns:
DataFrame with the unknown record added to
- Return type:
DataFrame