Integrating with Flink
Tecton integrates seamlessly with Apache Flink and is commonly used alongside it in modern streaming ML stacks.
While Tecton handles feature engineering — transforming and aggregating events into features for real-time inference and model training — Flink is often the right tool for upstream stateful event stream preparation.
⚠️ Requirements and Limitations
This integration pattern:
- Requires the Stream Ingest API: Flink integration with Tecton only works through the Stream Ingest API.
- Does not require Rift batch compute: While this integration uses the Stream Ingest API, it doesn't require using Rift for batch compute.
📐 Recommended Architecture
If your upstream events are on Kafka or Kinesis, we recommend the following responsibility split:
✅ Flink is responsible for:
- Deduplicating events
- Stream enrichment (e.g., joining with metadata)
- Filtering malformed or irrelevant events
- Ensuring at-least-once or exactly-once delivery
Flink transforms raw bronze data into clean silver streams. It's typically used by Data Engineers.
📚 Further Reading Please also see Confluent's page on the Shift Left paradigm, which explains in more detail how Apache Flink is used upstream to clean, enrich, and govern event data before it's consumed by downstream systems like Tecton.
✅ Tecton is responsible for:
- Applying row-level transformations
- Applying time window aggregations
- Leveraging Python packages or ML models to transform events
- Joining in other precomputed features
- Applying real-time transformations at feature request time
- Serving features online or generating training data
Tecton transforms silver events into gold ML-ready features. It's most commonly used by Data Scientists and MLEs.
🔄 Data Flow
-
Flink publishes cleaned events to Tecton's Rift Streaming Ingest API, where they are processed in real time and written to the online store.
-
Flink also writes the same events to a data warehouse or data lake, which Tecton uses to backfill features and generate training datasets.
This dual-write pattern ensures online/offline consistency.
🧠 Batch World Analogy
Think of Flink like dbt for streams, and Tecton like the feature layer on top.
In the batch world:
- You might use dbt to turn bronze logs into silver tables (event cleaning, enrichment, normalization).
- Then, you'd define features on top of those silver tables using Tecton.
This same pattern applies in streaming — only now it's real-time.
❗️Is Flink required?
No — Flink is not required to use Tecton. Applications can push events directly to Rift Streaming's Ingest API via its REST endpoint.
Conversely, the Stream Ingest API is required for this Flink integration pattern. This integration is currently not available for VPC deployments, as the Stream Ingest API does not support VPC environments.