adp.ingest.udf

Custom UDF’s to apply on columns of a dataframe.

The columns in this file can be used in the masking definition in the YAML file.

Functions

format_string_as_valid_python_boolean_value(string)

UDF to convert a string value to a valid python boolean value

format_string_as_valid_python_boolean_value_udf(string)

UDF to convert a string value to a valid python boolean value

format_string_as_valid_python_numeric_value(string)

Format a string as a numberic value

format_string_as_valid_python_numeric_value_udf(string)

Format a string as a numberic value

get_timestamp_based_on_file_name(...)

Create a timestamp using a file name

get_timestamp_based_on_file_name_udf(...)

Create a timestamp using a file name

get_timestamp_based_on_path(input_file_name, ...)

Retrieves a timestamp based on a path

get_timestamp_based_on_path_udf(...)

Retrieves a timestamp based on a path

privacy_binary(double)

Replaces a binary by bin(0)

privacy_binary_udf(double)

Replaces a binary by bin(0)

privacy_boolean(boolean)

Replace a boolean by False

privacy_boolean_udf(boolean)

Replace a boolean by False

privacy_date(date)

Scrambles a date column

privacy_date_udf(date)

Scrambles a date column

privacy_double(double)

Replaces a double by -1337.1337

privacy_double_udf(double)

Replaces a double by -1337.1337

privacy_hide_email(email)

Udf to hide an email adres thomas@gmail.com -> t*****@g****.com

privacy_hide_email_udf(email)

Udf to hide an email adres thomas@gmail.com -> t*****@g****.com

privacy_int(integer)

Replaces an integer by -1337

privacy_int_udf(integer)

Replaces an integer by -1337

privacy_long(long)

Replaces an LongType by -1337

privacy_long_udf(long)

Replaces an LongType by -1337

privacy_string(string)

Scrambles a string value by replacing it by '*'

privacy_string_sha1(string)

Use a sha1 hash on a string value to hide it

privacy_string_sha1_udf(string)

Use a sha1 hash on a string value to hide it

privacy_string_udf(string)

Scrambles a string value by replacing it by '*'

privacy_timestamp(timestamp)

Scrables a timestamp column

privacy_timestamp_udf(timestamp)

Scrables a timestamp column