Version: 1.0

Integrating with Flink

Tecton integrates seamlessly with Apache Flink and is commonly used alongside it in modern streaming ML stacks.

While Tecton handles feature engineering — transforming and aggregating events into features for real-time inference and model training — Flink is often the right tool for upstream stateful event stream preparation.

Requirements and Limitations

This integration pattern:

Two Integration Patterns Available: Flink can integrate with Tecton through two supported patterns:
- Using the Stream Ingest API directly
- Publishing to Kafka/Kinesis and using Tecton's Spark Streaming integration
Does not require Rift batch compute: While the Stream Ingest API integration requires the API be enabled, it doesn't require using Rift for batch compute.

Integration Patterns

Tecton supports two architectural patterns for integrating with Flink:

Pattern 1: Flink → Stream Ingest API → Tecton

In this pattern, Flink publishes cleaned events directly to Tecton's Stream Ingest API, where they are processed by the Rift compute engine.

This integration pattern is currently not available for VPC deployments, as the Stream Ingest API does not support VPC environments.

Pattern 2: Flink → Kafka/Kinesis → Spark Streaming → Tecton

In this pattern, Flink publishes to Kafka or Kinesis, and Tecton's Spark Streaming integration consumes from these message queues.

Choosing the Right Pattern

Consideration	Stream Ingest API	Kafka/Kinesis + Spark
Throughput	Best for < 1k records/second	Better for > 1k records/second
Latency	Millisecond-level freshness	Second-level freshness
Infrastructure	No additional message queue needed	Requires Kafka/Kinesis infrastructure
Existing Stack	Standalone solution	Better if already using Spark
Compute Engine	Rift (Python-native)	Spark Structured Streaming

Recommended Architecture

If your upstream events are on Kafka or Kinesis, we recommend the following responsibility split:

Flink is responsible for:

Deduplicating events
Stream enrichment (e.g., joining with metadata)
Filtering malformed or irrelevant events
Ensuring at-least-once or exactly-once delivery

Flink transforms raw bronze data into clean silver streams. It's typically used by Data Engineers.

Further Reading Please also see Confluent's page on the Shift Left paradigm, which explains in more detail how Apache Flink is used upstream to clean, enrich, and govern event data before it's consumed by downstream systems like Tecton.

Tecton is responsible for:

Applying row-level transformations
Applying time window aggregations
Leveraging Python packages or ML models to transform events
Joining in other precomputed features
Applying real-time transformations at feature request time
Serving features online or generating training data

Tecton transforms silver events into gold ML-ready features. It's most commonly used by Data Scientists and MLEs.

Data Flow

Pattern 1: Stream Ingest API Integration

Flink publishes cleaned events to Tecton's Stream Ingest API, where they are processed in real time by Rift and written to the online store.
Flink also writes the same events to a data warehouse or data lake, which Tecton uses to backfill features and generate training datasets.

Pattern 2: Kafka/Kinesis Integration

Flink publishes cleaned events to Kafka or Kinesis message queues.
Tecton's Spark Streaming jobs consume from these queues, process the events, and write features to both online and offline stores.
Historical data in the data warehouse/lake is used for feature backfills and training dataset generation.

Both patterns ensure online/offline consistency through dual-write or coordinated processing strategies.

Batch World Analogy

Think of Flink like dbt for streams, and Tecton like the feature layer on top.

In the batch world:

You might use dbt to turn bronze logs into silver tables (event cleaning, enrichment, normalization).
Then, you'd define features on top of those silver tables using Tecton.

This same pattern applies in streaming — only now it's real-time.

Requirements and Limitations​

Integration Patterns​

Pattern 1: Flink → Stream Ingest API → Tecton​

Pattern 2: Flink → Kafka/Kinesis → Spark Streaming → Tecton​

Choosing the Right Pattern​

Recommended Architecture​

Flink is responsible for:​

Tecton is responsible for:​

Data Flow​

Pattern 1: Stream Ingest API Integration​

Pattern 2: Kafka/Kinesis Integration​

Batch World Analogy​

Was this page helpful?