Skip to content

Stream Feature View

A StreamFeatureView is used for simple row-level transformation streaming features. It processes raw data from a streaming source (e.g. Kafka and Kinesis) and can be backfilled from any BatchDataSource (e.g. S3, Hive Tables, Redshift) that contains a historical log of events.

Use a StreamFeatureView, if:

  • your use case requires very fresh features (<1 minute) that update whenever a new raw event is available on the stream
  • you want to run simple row-level based transformation on the raw data, or simply ingest raw data without further transformations
  • you have your raw events available on a stream

Common Examples:

  • Last transaction amount of a user's transaction stream
  • Stream ingesting precomputed feature values from an existing Kafka or Kinesis stream

Please see StreamWindowAggregateFeatureView for a specialized StreamFeatureView that supports efficiently calculated time window aggregations.

Feature Definition Example

For more examples see Examples here.

Annotation Parameters

See the API reference for the full list of parameters.

Transformation Pipeline

Stream Feature Views can use pyspark or spark_sql transformation types. You can configure mode=pipeline to construct a pipeline of those transformations, or use mode=pyspark or mode=spark_sql to define an inline transformation.

The output of your transformation must include columns for the entity IDs and a timestamp. All other columns will be treated as features.

Usage Example

See how to use a Stream Feature View in a notebook here.

How they work

When materialized online, Tecton will run the StreamFeatureView transformation on each event that comes in from the underlying stream source, and write it to the online store. Any previous values will be overwritten, so the online store only has the most recent value.

Streaming Transformations are executed as Spark Structured Streaming jobs (additional compute will be supported soon).

Additionally, Tecton will run the same Stream Feature View transformation pipeline against the StreamDataSource's batch source (a historical log of stream events) when materializing feature values to the offline store. This offline batch source will enable you to create training data sets using the same feature definition as online.

FAQ

How is a StreamFeatureView different to a StreamWindowAggregateFeatureView ?

A StreamFeatureView is the more generic but less specialized sibling to a StreamWindowAggregateFeatureView. A StreamFeatureView is an abstraction on top of Spark Structured Streaming. Use a StreamWindowAggregateFeatureView whenever you care about running time window aggregations. See the StreamWindowAggregateFeatureView documentation for a quick explanation of how Tecton supports these types of features under the hood by leveraging Spark Structured Streaming as well as on-demand transformation.