AggregationLeadingEdge
The AggregationLeadingEdge
enum allows users to choose what timestamp they
would like Tecton to use for the leading edge of the aggregation window.
Note: If a user is upgrading a feature view to 1.0.0+, they are required to
explicitly set this parameter to AggregationLeadingEdge.LATEST_EVENT_TIME
since this is the default prior to 1.0.
Attributes​
LATEST_EVENT_TIME
: Default prior to 1.0: Tecton uses the latest event time of the stream to decide where to set the leading edge of all aggregation windows for that feature view.WALL_CLOCK_TIME
: Default in 1.0: Tecton uses the wall clock time of the request on the feature server to decide where to set the leading edge of all aggregation windows for that feature view.
Example:​
Let’s assume these parameters:
- Your stream is 30 minutes late, and the latest stream event that has arrived
in the online store is
2024-07-29T2:31:00Z
- The online feature vector read request is for a 1-hour sum aggregation
(
SUM(col)
). Read Request was made at timestamp:2024-07-29T3:00:00Z
- With the following event time data:
Timestamp | col | Included in aggregation using WALL_CLOCK_TIME | Included in aggregation using LATEST_EVENT_TIME |
---|---|---|---|
2024-07-29T1:32:00Z | 1 | no | yes |
2024-07-29T1:55:00Z | 1 | no | yes |
2024-07-29T2:10:00Z | 1 | yes | yes |
2024-07-29T2:32:00Z | 1 | yes | yes |
2024-07-29T2:41:00Z | 1 | no | no |
2024-07-29T2:49:00Z | 1 | no | no |
2024-07-29T2:55:00Z | 1 | no | no |
The sum using WALL_CLOCK_TIME
is 2, while the sum using LATEST_EVENT_TIME
is 4. The reason for this is 30 minutes of data, i.e., 3 data points after 2:31
AM that are not counted towards a full 1-hour aggregation using the
WALL_CLOCK_TIME
timestamp.
FAQ​
- Why is wall clock time the default behavior?
- This improves most users' out-of-the-box experience, to align with common use
cases, and significantly reduce read costs. The default,
aggregation_leading_edge=AggregationLeadingEdge.WALL_CLOCK_TIME
, uses the current request timestamp as the aggregation window's leading edge, which is often more intuitive and useful in real-time scenarios and leads to much cheaper reads than usingLATEST_EVENT_TIME
.
- Why can’t I directly set the
aggregation_leading_edge=WALL_CLOCK_TIME
for Stream Feature Views applied with Tecton SDK < 1.0?- This change may cause differences in the aggregate feature values served.
- For example, a 2 minute lagged stream will always compute a 2 min-lagged 30 minute aggregation meaning that the 30 minute window will be missing 2 min worth of data. If we use the latest event time, both the offline and online aggregations will always compute a full 30 minute window of data. This issue becomes worse as the stream delay becomes larger.
- Do we have any plans to match this behavior for the offline store?
- Yes, Tecton has plans to add functionality to resolve some data delay related skew to the offline retrieval code path.
- I want to experiment with the different aggregation leading edge strategies,
how can I do this?
- You can experiment by controlling the behavior of the aggregation leading
edge at the request time, which will overridden the Stream Feature View
configuration. The
aggregation_leading_edge
parameter can be overridden at the request level as follows:
- You can experiment by controlling the behavior of the aggregation leading
edge at the request time, which will overridden the Stream Feature View
configuration. The
NOTE: The override functionality is scheduled for deprecation in a future release to align with our long-term goal of simplifying the system and improving cost-efficiency.
$ curl -X POST http://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"params": {
"feature_service_name": "mockdata_feature_service",
"join_key_map": {
"user_id": "user_1",
},
"requestOptions": {
"aggregationLeadingEdge" = "AGGREGATION_MODE_WALL_CLOCK_TIME" or "AGGREGATION_MODE_LATEST_EVENT_TIME"
},
"workspace_name": "prod"
}
}