Skip to main content
Version: 1.0

Connect to Snowflake Using Spark

Tecton can use Snowflake as a source of batch data for feature materialization with Spark. This page explains how to set up Tecton to use Snowflake as a data source.

Prerequisites​

To set up Tecton to use a data source on Snowflake, you need the following:

  • A notebook connection to Databricks or EMR.
  • The URL for your Snowflake account.
  • The name of the virtual warehouse Tecton will use for querying data from Snowflake.
  • A Snowflake username and password. We recommend you create a new user in Snowflake configured to give Tecton read-only access. This user needs to have access to the warehouse. See Snowflake documentation on how to configure this access.
  • A Snowflake Read-only role for Spark, granted to the user created above. See the Snowflake documentation for the required grants.
info

If you're using different warehouses for different data sources, the username / password above needs to have access to each warehouse. Otherwise, you'll run into the following exception when running get_features_for_events() or run_transformation():

net.snowflake.client.jdbc.SnowflakeSQLException: No active warehouse selected in the current session. Select an active warehouse with the 'use warehouse' command.

Configuring Secrets​

To enable the Spark jobs managed by Tecton to read data from Snowflake, you will configure secrets in your secret manager.

For EMR users, follow the instructions to add a secret to the AWS Secrets Manager. For Databricks users, follow the instructions for creating a secret with Databricks secret management. Databricks users may also use AWS Secrets Manager if preferred.

Note that if your deployment name starts with tecton- already, the prefix would merely be your deployment name. The deployment name is typically the name used to access Tecton, i.e. https://<deployment-name>.tecton.ai.

  1. Add a secret named tecton-<deployment-name>/SNOWFLAKE_USER, and put the Snowflake user name you configured above.
  2. Add a secret named tecton-<deployment-name>/SNOWFLAKE_PASSWORD, and put the Snowflake password you configured above.

Verifying​

To verify the connection, add a Snowflake-backed Data Source. Do the following:

  1. Add a SnowflakeConfig Data Source Config object in your feature repository. Here's an example:

    from tecton import SnowflakeConfig, BatchSource

    # Declare SnowflakeConfig instance object that can be used as an argument in BatchSource
    snowflake_config = SnowflakeConfig(
    url="https://<your-cluster>.<your-snowflake-region>.snowflakecomputing.com/",
    database="CLICK_STREAM_DB",
    schema="CLICK_STREAM_SCHEMA",
    warehouse="COMPUTE_WH",
    table="CLICK_STREAM_FEATURES",
    )

    # Use in the BatchSource
    snowflake_ds = BatchSource(name="click_stream_snowflake_ds", batch_config=snowflake_config)
  2. Run tecton plan.

The Data Source is added to Tecton. A misconfiguration results in an error message.

Was this page helpful?