Batch Window Aggregate Feature Views
BatchWindowAggregateFeatureView is used for batch time-window aggregation features, such as a 1 hour rolling count of per-user transactions. It processes raw data from any
BatchDataSource (e.g. S3, Hive Tables, Redshift) that contains a historical log of events.
- you have your raw events available in a Batch Data Source
- you need tumbling, hopping or rolling time window aggregations of type
- your use case can tolerate a feature freshness of > 1 hour
- 1 hour rolling click count of a user
- Last 10 transactions of a user
- Max transaction amount of a user
BatchWindowAggregateFeatureView is a specialized implementation for time-window aggregations that is more efficient and performant than what a normal
BatchFeatureView could accomplish. Tecton is able to achieve higher efficiency and feature freshness, because it stores partial feature values in tiles that are rolled-up at feature request time (for more details, see below).
Feature Definition Example
For more examples see Examples here.
See the API reference for the full list of parameters.
In the body of your Python function, you'll define row-level transformations that will then be aggregated according to the
Your transformation must output a column for each entity and a timestamp column. Each additional column must be aggregated by at least one
FeatureAggregation. The final number of features will be based on the number of time windows you configure.
See how to use a Batch Window Aggregate Feature View in a notebook here.
How they work
BatchWindowAggregateFeatureView uses Spark jobs under the hood. They update on some frequency (the slide period) and aggregate over an often longer period of time (the time window). After each slide period has elapsed, Tecton will update the value in the online store.
Behind the scenes, Tecton stores partial aggregations in the form of tiles. The tile size is defined by the
aggregation_slide_period parameter. At feature request-time, Tecton's online and offline feature serving capabilities automatically roll up the persisted tiles (as well as persisted event projections in the case of continuous streaming features). This has several key benefits:
- Significantly reduced storage requirements if you define several time windows
- Reduced precompute resource requirements, given that Tecton needs to only compute incremental tiles and not the entire time window