Stream Feature View
StreamFeatureView is used for simple row-level transformation streaming features. It processes raw data from a streaming source (e.g. Kafka and Kinesis) and can be backfilled from any
BatchDataSource (e.g. S3, Hive Tables, Redshift) that contains a historical log of events.
- your use case requires very fresh features (<1 minute) that update whenever a new raw event is available on the stream
- you want to run simple row-level based transformation on the raw data, or simply ingest raw data without further transformations
- you have your raw events available on a stream
- Last transaction amount of a user's transaction stream
- Stream ingesting precomputed feature values from an existing Kafka or Kinesis stream
StreamWindowAggregateFeatureView for a specialized StreamFeatureView that supports efficiently calculated time window aggregations.
Feature Definition Example
For more examples see Examples here.
See the API reference for the full list of parameters.
Stream Feature Views can use
spark_sql transformation types. You can configure
mode=pipeline to construct a pipeline of those transformations, or use
mode=spark_sql to define an inline transformation.
The output of your transformation must include columns for the entity IDs and a timestamp. All other columns will be treated as features.
See how to use a Stream Feature View in a notebook here.
How they work
When materialized online, Tecton will run the
StreamFeatureView transformation on each event that comes in from the underlying stream source, and write it to the online store. Any previous values will be overwritten, so the online store only has the most recent value.
Streaming Transformations are executed as Spark Structured Streaming jobs (additional compute will be supported soon).
Additionally, Tecton will run the same Stream Feature View transformation pipeline against the
StreamDataSource's batch source (a historical log of stream events) when materializing feature values to the offline store. This offline batch source will enable you to create training data sets using the same feature definition as online.
How is a
StreamFeatureView different to a
StreamFeatureView is the more generic but less specialized sibling to a
StreamWindowAggregateFeatureView. A StreamFeatureView is an abstraction on top of Spark Structured Streaming. Use a
StreamWindowAggregateFeatureView whenever you care about running time window aggregations. See the
StreamWindowAggregateFeatureView documentation for a quick explanation of how Tecton supports these types of features under the hood by leveraging Spark Structured Streaming as well as on-demand transformation.