Skip to main content
Version: 1.1

Stream Feature View

Stream Feature Views (SFV) compute feature values from a continuous streaming data source. They support the near real-time calculation of features with a freshness of less than 1s.

Stream Feature Views also support point-in-time correct training data generation, as well as backfills of newly created features, from a historical offline log of event data.

Tecton offers two different compute engines for Streaming Features:

See Feature Naming Requirements for detailed naming rules and examples.

  • Rift: records sent to the Stream Ingest API are optionally transformed with Rift and then written directly to the Feature Store. The Stream Ingest API is the right choice if you prefer the simple experience of Python-based transformation environments, want to ingest pre-computed features, are building an event-driven architecture, or simply if you need very fresh features and every millisecond counts.
  • Spark Structured Streaming: records are read from your streaming source, transformed, and written to the Feature Store by a Spark Structured Streaming job in your data plane. Spark Streaming features may be the right choice if your existing data stack already heavily relies on Spark and Spark Structured Streaming.

Batch Sources for Stream Feature Views​

Features based on streaming data can be hard to develop, test, and productionize, because stream retention on the data source is shorter than the lookback window required by the feature.

Tecton's solution is to leverage a complimentary Batch Source containing historical data (often a log or materialized table mirroring the stream). Tecton can rely on that source to compute historical values:

  • In Notebook Development, Tecton uses the batch source to compute historical feature values, so a Data Scientist can develop, test, and iterate on feature definitions without orchestrating their own backfills.
  • When deploying a Stream Feature View, Tecton automatically runs batch batchfill jobs to populate the online store with accurate values up to the feature_start_time. This ensures that features are immediately available and correct, without having to wait for data to accumulate from the Stream Source.

After a feature has been productionized, the Stream Source takes over, processing new events in near real-time and updating feature values incrementally.

Was this page helpful?