tecton.HiveDSConfig

class tecton.HiveDSConfig(table, database, date_partition_column=None, timestamp_column_name=None, timestamp_format=None, skip_validation=False, datetime_partition_columns=None, raw_batch_translator=None)

Configuration used to reference a Hive table.

The HiveDSConfig class is used to create a reference to a Hive Table.

This class used as an input to a VirtualDataSource’s parameter batch_config. This class is not a Tecton Primitive: it is a grouping of parameters. Declaring this class alone will not register a data source. Instead, declare a VirtualDataSource that takes this configuration class as an input.

Methods

__init__

Instantiates a new HiveDSConfig.

__init__(table, database, date_partition_column=None, timestamp_column_name=None, timestamp_format=None, skip_validation=False, datetime_partition_columns=None, raw_batch_translator=None)

Instantiates a new HiveDSConfig.

Parameters
  • table (str) – A table registered in Hive MetaStore.

  • database (str) – A database registered in Hive MetaStore.

  • date_partition_column (Optional[str]) – (Optional) Partition column name in case the raw data is partitioned by date, otherwise None.

  • datetime_partition_columns (Optional[List[DatetimePartitionColumn]]) – (Optional) List of DatetimePartitionColumn the raw data is partitioned by, otherwise None.

  • timestamp_column_name (Optional[str]) – (Optional) Name of timestamp column. Only required if timestamp_format is specified.

  • timestamp_format (Optional[str]) – (Optional) Format of string-encoded timestamp column (e.g. “yyyy-MM-dd’T’hh:mm:ss.SSS’Z’”)

  • raw_batch_translator – Python user defined function f(DataFrame) -> DataFrame that takes in raw PySpark data source DataFrame and translates it to the DF to be consumed by the Feature Package. See an example of raw_batch_translator in the User Guide.

Returns

A HiveDSConfig class instance.