Connect to Snowflake
To run feature pipelines based on data in Snowflake, Tecton needs to be configured with access to your Snowflake account. The following guide shows how to configure these permissions and validate that Tecton is able to connect to your data source.
Prerequisites​
To set up Tecton to use a data source on Snowflake, you need the following:
- The URL for your Snowflake account.
- The name of the virtual warehouse Tecton will use for querying data from Snowflake.
- A Snowflake username and private key. See Snowflake's guide on
configuring key-pair authentication.
- We recommend you create a new user in Snowflake configured to give Tecton read-only access. This user needs to have access to the warehouse. See Snowflake documentation on how to configure this access.
- The Snowflake role must have USAGE and SELECT permissions on the relevant database objects (database, schema, tables) to perform read operations.
Configuring Snowflake Authentication​
In the past snowflake supported authentication with password, and you may have
snowflake data sources that relied on this authentication method. Existing data
sources that connect to Snowflake using the password parameter will continue
to work. However, password authentication is being deprecated by Snowflake and
will be disabled later this year. See
Snowflake's deprecation notice
for a more detailed timeline. We recommend setting up private key authentication
as soon as possible.
To enable
materialization jobs
to authenticate to Snowflake you will add the username and private key as
secrets in Tecton Secrets and reference them
in your Snowflake configuration block as shown below. If you use a different
type of authentication such as OAuth, you can instead use a custom
pandas_batch_config and retrieve and inject secrets into a block of code you
define there to connect to your Snowflake instance.
When adding the private key secret, copy the entire key including delimiters.
Testing a Snowflake Data Source​
To validate that Tecton can read your Snowflake data source, create a Tecton
Data Source definition and test that you can read data from the Data Source. The
following example shows how to define a
SnowflakeConfig
in your notebook environment using username/private_key authentication, and
validate that Tecton is able to read from your Snowflake data source.
You can also supply an additional parameter private_key_passphrase in case you
choose to generate an encrypted private key.
import tecton
# Follow the prompt to complete your Tecton Account sign in
tecton.login("https://<your-account>.tecton.ai")
# Declare SnowflakeConfig instance object that can be used as an argument in BatchSource
snowflake_config = SnowflakeConfig(
url="https://<your-cluster>.<your-snowflake-region>.snowflakecomputing.com/",
database="CLICK_STREAM_DB",
schema="CLICK_STREAM_SCHEMA",
warehouse="COMPUTE_WH",
table="CLICK_STREAM_FEATURES",
user=Secret(scope="your-snowflake-scope", key="your-snowflake-user-key"),
private_key=Secret(scope="your-snowflake-scope", key="your-snowflake-private-key"),
# Add private_key_passphrase only if you generate an encrypted private key
# private_key_passphrase=Secret(scope="your-snowflake-scope", key="your-snowflake-private)
)
# Use in the BatchSource
snowflake_ds = BatchSource(name="click_stream_snowflake_ds", batch_config=snowflake_config)
# Read sample data
snowflake_ds.get_dataframe().to_pandas().head(10)