Skip to content

Using Tecton on Snowflake

Note

Tecton on Snowflake is currently available in preview to all Snowflake customers on AWS.

Tecton on Snowflake is an integration of Tecton with Snowflake. In this integration:

  • The Snowflake compute engine is used to process (materialize) Tecton features. Tecton submits queries to Snowflake for processing.

  • A Tecton-managed Snowflake database is used as the offline store.

Installation

To install Tecton on Snowflake, follow the CLI installation instructions and then the Snowflake deployment instructions.

Currently, each Tecton cluster is connected to a single data platform. If you are an existing Tecton user, Tecton on Snowflake will require a separate Tecton cluster. In the future, Tecton will allow multiple data platforms per cluster.

Supported feature types

Tecton on Snowflake supports Batch Feature Views and On-Demand Feature Views. Stream Feature Views are not supported.

Enabling a Batch Feature View to use Tecton on Snowflake

In the @batch_feature_view decorator, set mode to snowflake_sql.

Offline materialization (e.g. when offline is set to True) is not yet supported, but will be coming soon.

Enabling an On-Demand Feature View to use Tecton on Snowflake

On-Demand Feature Views can be used to execute request-time transformations when fetching data online. This allows you to incorporate real-time request data or compute feature crosses that can't be feasibly pre-computed. For more details, refer to the On-Demand Feature View documentation.

These transformations are also executed consistently offline when generating training data using Snowflake's Snowpark for Python integration. Snowpark for Python is currently in private preview. If you wish to test out On-Demand Feature Views on Snowflake, please contact the Tecton team to help enable the integration.

Generating training data using a notebook

You can use any Python-supported notebook software to generate training data using Tecton on Snowflake. Jupyter is recommended.

Follow these steps to generate training data:

1. Create a spine

Query Snowflake to return the records, in a pandas DataFrame, to be used to generate your training data. Your query may written against one or more Snowflake tables and/or views. The output of this query is known as a spine.

To query Snowflake, you can use the SnowflakeContext object, which is provided by Tecton. Alternatively, you can query Snowflake using any other method that Snowflake supports for running queries from a Python notebook.

To query Snowflake using the SnowflakeContext object, use one of the following methods.

Method 1: With Snowpark installed

spine = "<SELECT query>"
SnowflakeContext.get_instance().get_session().sql(spine).toPandas()

Method 2: Without Snowpark installed

spine = "<SELECT query>"
cursor = SnowflakeContext.get_instance().get_connection().cursor()
cursor.execute(spine)
cursor.fetch_pandas_all()

2. Call the get_historical_features() function

Call the Tecton get_historical_features() function, passing the spine query to the function. The function returns a training data set containing each of the rows of the spine, with feature values added as columns in each row.

The following code demonstrates a call to get_historical_features(), using the spine that was generated in the first step. After the call, the training data is displayed.

feature_service = tecton.get_feature_service("feature_service")
training_data = feature_service.get_historical_features(spine, timestamp_key='TIMESTAMP').to_pandas().fillna(0)

display(training_data)

The new framework for Tecton on Snowflake

See a preview of the new framework for Tecton on Snowflake.