Compute in Tecton
Tecton offers flexible compute engines to power your feature transformations at any scale—batch, streaming, or real-time. Each feature pipeline can use a different engine (or even mix and match for development vs. production) without sacrificing consistency. Whether you’re transforming data on a schedule, processing a live data stream, or computing features on-the-fly for real-time inference, Tecton’s compute engines have you covered.
Rift (Public Preview)​
Rift is Tecton's built-in compute engine for transforming features in batch, streaming, or real-time. It supports vanilla Python, Pandas, and SQL transformations, giving data teams a familiar and flexible development experience.
- Integrates with Warehouses: Rift natively connects to Snowflake and BigQuery, pushing down computations to your data warehouse when beneficial.
- Local Development: You can run Rift locally in any Python environment for fast, iterative feature development.
- Rich Python Support: Bring in any Python library (via
pip
) to extend your feature transformations. If Python can read your data, Rift can process it.
Spark​
If you already have a Spark-based infrastructure—or simply prefer Spark—Tecton integrates seamlessly with providers like Databricks and AWS EMR. This lets you use Spark SQL or PySpark to transform your batch or streaming features:
- Notebook Integration: When working in a notebook, Tecton automatically runs Spark queries on your attached Spark cluster.
- Batch and Streaming: Spark can process large-scale batch data or real-time streams, offering a comprehensive solution for feature computation.
Selecting Compute Engines​
Feature Views​
When defining Batch or Stream Feature Views, you specify the compute
engine and transformation language using the mode
parameter:
- Rift
mode='pandas'
: Python/Pandas transformations run on Rift.mode='snowflake_sql'
ormode='bigquery_sql'
(batch only): SQL queries pushed down to your configured warehouse.
- Spark
mode='spark_sql'
ormode='pyspark'
: Transformations run on your connected Spark cluster.
This setup lets you choose the engine that best fits each pipeline, whether you’re optimizing for Python-based data science workflows, leveraging cloud data warehouses, or taking advantage of an existing Spark ecosystem.
Realtime Feature Views always run real-time compute on Rift. Spark is not performant enough to run in real-time.