Version: Beta 🚧

Compute in Tecton

Tecton offers flexible compute engines to power your feature transformations at any scale—batch, streaming, or real-time. Each feature pipeline can use a different engine (or even mix and match for development vs. production) without sacrificing consistency. Whether you’re transforming data on a schedule, processing a live data stream, or computing features on-the-fly for real-time inference, Tecton’s compute engines have you covered.

Rift (Public Preview)

Rift is Tecton's built-in compute engine for transforming features in batch, streaming, or real-time. It supports vanilla Python, Pandas, and SQL transformations, giving data teams a familiar and flexible development experience.

Integrates with Warehouses: Rift natively connects to Snowflake and BigQuery, pushing down computations to your data warehouse when beneficial.
Local Development: You can run Rift locally in any Python environment for fast, iterative feature development.
Rich Python Support: Bring in any Python library (via pip) to extend your feature transformations. If Python can read your data, Rift can process it.

Spark

If you already have a Spark-based infrastructure—or simply prefer Spark—Tecton integrates seamlessly with providers like Databricks and AWS EMR. This lets you use Spark SQL or PySpark to transform your batch or streaming features:

Notebook Integration: When working in a notebook, Tecton automatically runs Spark queries on your attached Spark cluster.
Batch and Streaming: Spark can process large-scale batch data or real-time streams, offering a comprehensive solution for feature computation.

Selecting Compute Engines

Feature Views

When defining Batch or Stream Feature Views, you specify the compute engine and transformation language using the mode parameter:

Rift
- mode='pandas': Python/Pandas transformations run on Rift.
- mode='snowflake_sql' or mode='bigquery_sql' (batch only): SQL queries pushed down to your configured warehouse.
Spark
- mode='spark_sql' or mode='pyspark': Transformations run on your connected Spark cluster.

This setup lets you choose the engine that best fits each pipeline, whether you’re optimizing for Python-based data science workflows, leveraging cloud data warehouses, or taking advantage of an existing Spark ecosystem.

note

Realtime Feature Views always run real-time compute on Rift. Spark is not performant enough to run in real-time.

Rift (Public Preview)​

Spark​

Selecting Compute Engines​

Feature Views​

Was this page helpful?

Rift (Public Preview)

Spark

Selecting Compute Engines

Feature Views