Connect to Redshift
Tecton can use Amazon Redshift as a source of batch data for feature materialization. This page explains how to set up Tecton to use Redshift as a data source.
To set up Tecton with Redshift, you need the following:
- A notebook connection to Databricks or EMR.
- A Redshift Cluster Endpoint. The Redshift cluster must be configured for access over the public internet. We recommend using IP whitelisting to ensure only Tecton can access your Redshift Cluster (your Tecton deployment specialist can provide you with IP ranges).
- A Redshift username and password. We recommend that you create a new user in Redshift configured to give Tecton read-only access to Redshift.
Setting Up the Connection
To enable the Spark jobs managed by Tecton to read data from Redshift, you will configure secrets in your secret manager.
For EMR users, follow the instructions to add a secret to the AWS Secrets Manager. For Databricks users, follow the instructions for creating a secret with Databricks secret management. Databricks users may also use AWS Secrets Manager if preferred.
Note that if your deployment name starts with tecton- already, the prefix would
merely be your deployment name. The deployment name is typically the name used
to access Tecton, i.e.
- Add a secret named
tecton-<deployment-name>/REDSHIFT_USER, and put the Redshift user name you configured above.
- Add a secret named
tecton-<deployment-name>/REDSHIFT_PASSWORD, and put the Redshift password you configured above.
To verify the connection, add a Redshift-backed Data Source. Do the following:
RedshiftConfigData Source Config object in the Redshift Feature Repository as shown here:
transactions_redshift_batch_ds = RedshiftConfig(
The Data Source is added to Tecton. A misconfiguration results in an error message.
Notebook Cluster Access
Once you've created a Redshift Data Source you can test connecting to it in your notebook environment.
You may need to install s3://redshift-downloads/drivers/jdbc/126.96.36.1997/RedshiftJDBC42-no-awssdk-188.8.131.527.jar to your notebook cluster if a redshift driver is not already present.
In your notebook, test connection via
ws = tecton.get_workspace("prod")
ds = ws.get_data_source("<your_data_source_name>")