Using Tecton on Snowflake
Note
Tecton on Snowflake is currently available in preview to all Snowflake customers on AWS.
Tecton on Snowflake is an integration of Tecton with Snowflake. In this integration:
-
The Snowflake compute engine is used to process (materialize) Tecton features. Tecton submits queries to Snowflake for processing.
-
A Tecton-managed Snowflake database is used as the offline store.
Installation
To install Tecton on Snowflake, follow the CLI installation instructions and then the Snowflake deployment instructions.
Currently, each Tecton cluster is connected to a single data platform. If you are an existing Tecton user, Tecton on Snowflake will require a separate Tecton cluster. In the future, Tecton will allow multiple data platforms per cluster.
Supported feature types
Tecton on Snowflake supports Batch Feature Views and On-Demand Feature Views. Stream Feature Views are not supported.
Enabling a Batch Feature View to use Tecton on Snowflake
In the @batch_feature_view
decorator, set mode
to snowflake_sql
.
Offline materialization (e.g. when offline
is set to True
) is not yet supported, but will be coming soon.
Enabling an On-Demand Feature View to use Tecton on Snowflake
On-Demand Feature Views can be used to execute request-time transformations when fetching data online. This allows you to incorporate real-time request data or compute feature crosses that can't be feasibly pre-computed. For more details, refer to the On-Demand Feature View documentation.
These transformations are also executed consistently offline when generating training data using Snowflake's Snowpark for Python integration. Snowpark for Python is currently in private preview. If you wish to test out On-Demand Feature Views on Snowflake, please contact the Tecton team to help enable the integration.
Generating training data using a notebook
You can use any Python-supported notebook software to generate training data using Tecton on Snowflake. Jupyter is recommended.
Follow these steps to generate training data:
1. Create a spine
Query Snowflake to return the records, in a pandas DataFrame
, to be used to generate your training data. Your query may written against one or more Snowflake tables and/or views. The output of this query is known as a spine.
To query Snowflake, you can use the SnowflakeContext
object, which is provided by Tecton. Alternatively, you can query Snowflake using any other method that Snowflake supports for running queries from a Python notebook.
To query Snowflake using the SnowflakeContext
object, use one of the following methods.
Method 1: With Snowpark installed
spine = "<SELECT query>"
SnowflakeContext.get_instance().get_session().sql(spine).toPandas()
Method 2: Without Snowpark installed
spine = "<SELECT query>"
cursor = SnowflakeContext.get_instance().get_connection().cursor()
cursor.execute(spine)
cursor.fetch_pandas_all()
2. Call the get_historical_features()
function
Call the Tecton get_historical_features()
function, passing the spine query to the function. The function returns a training data set containing each of the rows of the spine, with feature values added as columns in each row.
The following code demonstrates a call to get_historical_features()
, using the spine
that was generated in the first step. After the call, the training data is displayed.
feature_service = tecton.get_feature_service("feature_service")
training_data = feature_service.get_historical_features(spine, timestamp_key='TIMESTAMP').to_pandas().fillna(0)
display(training_data)