Skip to main content
Version: Beta 🚧

DatetimePartitionColumn

Summary​

Helper class to tell Tecton how underlying flat files are date/time partitioned for Hive/Glue data sources. This can translate into a significant performance increase.
 
You will generally include an object of this class in the datetime_partition_columns option in a HiveConfig object.
 
Example definitions: Assume you have an S3 bucket with parquet files stored in the following structure: s3://mybucket/2022/05/04/<multiple parquet files> , where 2022 is the year, 05 is the month, and 04 is the day of the month. In this scenario, you could use the following definition:

Examples

Example 1

datetime_partition_columns = [
    DatetimePartitionColumn(column_name="partition_0", datepart="year", zero_padded=True),
    DatetimePartitionColumn(column_name="partition_1", datepart="month", zero_padded=True),
    DatetimePartitionColumn(column_name="partition_2", datepart="day", zero_padded=True),
]
batch_config = HiveConfig(
    database='my_db',
    table='my_table',
    timestamp_field='timestamp',
    datetime_partition_columns=datetime_partition_columns,
)

Example 2

datetime_partition_columns = [
    DatetimePartitionColumn(column_name="partition_1", datepart="month", format_string="%Y-%m"),
]

Attributes​

The attributes are the same as the __init__ method parameters. See below.

Methods​

NameDescription
__init__(...)Initialize DatetimePartitionColumn

__init__(...)​

Parameters

  • column_name (str) - The name of the column in the Glue/Hive schema that corresponds to the underlying date/time partition folder. Note that if you do not explicitly specify a name in your partition folders, Glue will name the column of the form partition_0. Default: None

  • datepart (str) - The part of the date that this column specifies. Can be one of "year", "month", "day", "hour", or the full "date". If used with format_string, this should be the size of partition being represented, e.g. datepart="month" for format_string="%Y-%m". Default: None

  • zero_padded (bool) - Whether the datepart has a leading zero if less than two digits. This must be set to True if datepart="date". (Should not be set if format_string is set.) Default: false

  • format_string (Optional[str]) - A datetime.strftime format string override for "non-default" partition columns formats. E.g. "%Y%m%d" for datepart="date" instead of the Tecton default "%Y-%m-%d", or "%Y-%m" for datepart="month" instead of the Tecton default "%m". Default: None

info

This format string must convert python datetimes (via datetime.strftime(format)) to strings that are sortable in time order. For example, "%m-%Y" would be an invalid format string because "09-2019" > "05-2020".

See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes for format codes.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon