Skip to main content
Version: Beta 🚧

Compute in Tecton

Tecton offers flexible compute engines to power your feature transformations at any scale—batch, streaming, or real-time. Each feature pipeline can use a different engine (or even mix and match for development vs. production) without sacrificing consistency. Whether you’re transforming data on a schedule, processing a live data stream, or computing features on-the-fly for real-time inference, Tecton’s compute engines have you covered.

Rift (Public Preview)​

Rift is Tecton's built-in compute engine for transforming features in batch, streaming, or real-time. It supports vanilla Python, Pandas, and SQL transformations, giving data teams a familiar and flexible development experience.

  • Integrates with Warehouses: Rift natively connects to Snowflake and BigQuery, pushing down computations to your data warehouse when beneficial.
  • Local Development: You can run Rift locally in any Python environment for fast, iterative feature development.
  • Rich Python Support: Bring in any Python library (via pip) to extend your feature transformations. If Python can read your data, Rift can process it.

Spark​

If you already have a Spark-based infrastructure—or simply prefer Spark—Tecton integrates seamlessly with providers like Databricks and AWS EMR. This lets you use Spark SQL or PySpark to transform your batch or streaming features:

  • Notebook Integration: When working in a notebook, Tecton automatically runs Spark queries on your attached Spark cluster.
  • Batch and Streaming: Spark can process large-scale batch data or real-time streams, offering a comprehensive solution for feature computation.

Selecting Compute Engines​

Feature Views​

When defining Batch or Stream Feature Views, you specify the compute engine and transformation language using the mode parameter:

  • Rift
    • mode='pandas': Python/Pandas transformations run on Rift.
    • mode='snowflake_sql' or mode='bigquery_sql' (batch only): SQL queries pushed down to your configured warehouse.
  • Spark
    • mode='spark_sql' or mode='pyspark': Transformations run on your connected Spark cluster.

This setup lets you choose the engine that best fits each pipeline, whether you’re optimizing for Python-based data science workflows, leveraging cloud data warehouses, or taking advantage of an existing Spark ecosystem.

note

Realtime Feature Views always run real-time compute on Rift. Spark is not performant enough to run in real-time.

Was this page helpful?