Skip to main content
Version: Beta 🚧

Compute in Tecton

Feature computations can run in batch, steam, or real-time for a production application depending on the type of feature pipeline.

Compute engines in Tecton are interoperable. Different compute engines can be used for different feature piplines or selected independently for interactive development (i.e. training jobs).

Rift (Private Preview)

Rift is Tecton's built in compute engine for transforming features in batch, streaming, or real-time. Transformations in Rift can be written with vanilla Python, Pandas, or SQL.

Rift integrates natively with data warehouses like Snowflake and BigQuery and can push compute down to those systems where appropriate.

Rift can also run locally for fast and iterative feature development in any Python environment.

Rift can read from any data source that Python can read from and also allows you to bring arbitrary Python pip packages into your feature transformation.

Spark

Tecton can integrate with Spark providers like Databricks, AWS EMR, and Google Cloud Dataproc for transforming batch and stream features. Transformations can be written using Spark SQL and PySpark.

When iterating on features in a notebook, Tecton will run Spark queries on an attached Spark cluster.

Selecting Compute Engines

Feature Views

On Batch and Stream Feature Views, the compute engine and transformation language is chosen by the mode parameter.

When using mode='pandas' or mode='snowflake_sql' (batch only), Tecton will run Pandas or Snowflake SQL transformations on Rift. Snowflake SQL transformations will be pushed down into your configured warehouse.

When using mode='spark_sql' or mode='pyspark', Tecton will run the provided transformation as a Spark job in your connected Spark provider.

note

On-Demand Feature Views always run real-time compute on Rift. Spark is not performant enough to run in real-time.

Offline Feature Retrieval and Training Data Generation

To execute offline queries with Rift, set the compute_mode='rift' parameter in your get_historical_features() call.

To execute offline queries with Spark, set the compute_mode='spark' parameter in your get_historical_features() call.

The features being retrieved must be materialized offline or match the specified compute mode.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon