tecton.RedshiftDSConfig

class tecton.RedshiftDSConfig(endpoint, table=None, raw_batch_translator=None, temp_s3=None, query=None, timestamp_key=None)

Configuration used to reference a Redshift table or query.

The RedshiftDSConfig class is used to create a reference to a Redshift table. You can also create a reference to a query on one or more tables, which will be registered in Tecton in a similar way as a view is registered in other data systems.

This class used as an input to a BatchDataSource’s parameter batch_ds_config. This class is not a Tecton Object: it is a grouping of parameters. Declaring this class alone will not register a data source. Instead, declare as part of BatchDataSource that takes this configuration class instance as a parameter.

Methods

__init__

Instantiates a new RedshiftDSConfig.

__init__(endpoint, table=None, raw_batch_translator=None, temp_s3=None, query=None, timestamp_key=None)

Instantiates a new RedshiftDSConfig. One of table and query should be specified when creating this file.

Parameters
  • endpoint (str) – The connection endpoint to Redshift (e.g. redshift-cluster-1.cigcwzsdltjs.us-west-2.redshift.amazonaws.com:5439/dev).

  • table (Optional[str]) – The Redshift table for this Data source. Only one of table and query should be specified.

  • raw_batch_translator – Python user defined function f(DataFrame) -> DataFrame that takes in raw PySpark data source DataFrame and translates it to the DataFrame to be consumed by the Feature View. See an example of raw_batch_translator in the User Guide.

  • query (Optional[str]) – A Redshift query for this Data source. Only one of table and query should be specified.

  • temp_s3 (Optional[str]) – [deprecated] An S3 URI destination for intermediate data that is needed for Redshift. (e.g. s3://tecton-ai-test-cluster-redshift-data)

  • timestamp_key (Optional[str]) – (Optional) The name of the timestamp column (after the raw_batch_translator has been applied). The column name does not need to be specified if there is exactly one timestamp column after the translator is applied. This is needed for efficient time filtering when materializing batch features.

Returns

A RedshiftDSConfig class instance.

Example of a RedshiftDSConfig declaration:

from tecton import RedshiftDSConfig

# Declare RedshiftDSConfig instance object that can be used as an argument in BatchDataSource
redshift_ds_config = RedshiftDSConfig(endpoint="cluster-1.us-west-2.redshift.amazonaws.com:5439/dev",
                                      table="ad_serving_features",
                                      query="SELECT timestamp as ts, created, user_id, ad_id, duration"
                                            "FROM ad_serving_features")