Skip to content

Redshift

Tecton can use Amazon Redshift as a source of batch data for Feature Materialization.

Prerequisites

To set up Tecton with Redshift, you need the following:

  • A notebook connection to Databricks or EMR.
  • A Redshift Cluster Endpoint. The Redshift cluster must be configured for access over the public internet. We recommend using IP whitelisting to ensure only Tecton can access your Redshift Cluster (your Tecton deployment specialist can provide you with IP ranges).
  • A Redshift username and password. We recommend that you create a new user in Redshift configured to give Tecton read-only access to Redshift.

Setting Up the Connection

To enable the Spark jobs managed by Tecton to read data from Redshift, you will configure secrets in your secret manager.

For EMR users, follow the instructions to add a secret to the AWS Secrets Manager. For Databricks users, follow the instructions for creating a secret with Databricks secret management.

Note that if your deployment name starts with tecton- already, the prefix would merely be your deployment name. The deployment name is typically the name used to access Tecton, i.e. https://.tecton.ai.

  1. Add a secret named tecton-<deployment-name>/REDSHIFT_USER, and put the Redshift user name you configured above.
  2. Add a secret named tecton-<deployment-name>/REDSHIFT_PASSWORD, and put the Redshift password you configured above.

Verifying

To verify the connection, add a Redshift-backed Data Source. Do the following:

  1. Deploy a RedshiftDSConfig Data Source Config object in the Redshift Feature Repository as shown here:

    transactions_redshift_batch_ds = RedshiftDSConfig(
        endpoint=REDSHIFT_ENDPOINT,
        table=REDSHIFT_TABLE,
    )
    
  2. Run tecton plan.

The Data Source is added to Tecton. A misconfiguration results in an error message.