Skip to main content
Version: Beta 🚧

AggregationLeadingEdge

The AggregationLeadingEdge enum allows users to choose what timestamp they would like Tecton to use for the leading edge of the aggregation window.

Note: If a user is upgrading a feature view to 1.0.0+, they are required to explicitly set this parameter to AggregationLeadingEdge.LATEST_EVENT_TIME since this is the default prior to 1.0.

Attributes​

  • LATEST_EVENT_TIME: Default prior to 1.0: Tecton uses the latest event time of the stream to decide where to set the leading edge of all aggregation windows for that feature view.
  • WALL_CLOCK_TIME: Default in 1.0: Tecton uses the wall clock time of the request on the feature server to decide where to set the leading edge of all aggregation windows for that feature view.

Example:​

Let’s assume these parameters:

  • Your stream is 30 minutes late, and the latest stream event that has arrived in the online store is 2024-07-29T2:31:00Z
  • The online feature vector read request is for a 1-hour sum aggregation (SUM(col) ). Read Request was made at timestamp: 2024-07-29T3:00:00Z
  • With the following event time data:
TimestampcolIncluded in aggregation using WALL_CLOCK_TIMEIncluded in aggregation using LATEST_EVENT_TIME
2024-07-29T1:32:00Z1noyes
2024-07-29T1:55:00Z1noyes
2024-07-29T2:10:00Z1yesyes
2024-07-29T2:32:00Z1yesyes
2024-07-29T2:41:00Z1nono
2024-07-29T2:49:00Z1nono
2024-07-29T2:55:00Z1nono

The sum using WALL_CLOCK_TIME is 2, while the sum using LATEST_EVENT_TIME is 4. The reason for this is 30 minutes of data, i.e., 3 data points after 2:31 AM that are not counted towards a full 1-hour aggregation using the WALL_CLOCK_TIME timestamp.

FAQ​

  1. Why is wall clock time the default behavior?
  • This improves most users' out-of-the-box experience, to align with common use cases, and significantly reduce read costs. The default, aggregation_leading_edge=AggregationLeadingEdge.WALL_CLOCK_TIME, uses the current request timestamp as the aggregation window's leading edge, which is often more intuitive and useful in real-time scenarios and leads to much cheaper reads than using LATEST_EVENT_TIME.
  1. Why can’t I directly set the aggregation_leading_edge=WALL_CLOCK_TIME for Stream Feature Views applied with Tecton SDK < 1.0?
    • This change may cause differences in the aggregate feature values served.
    • For example, a 2 minute lagged stream will always compute a 2 min-lagged 30 minute aggregation meaning that the 30 minute window will be missing 2 min worth of data. If we use the latest event time, both the offline and online aggregations will always compute a full 30 minute window of data. This issue becomes worse as the stream delay becomes larger.
  2. Do we have any plans to match this behavior for the offline store?
    • Yes, Tecton has plans to add functionality to resolve some data delay related skew to the offline retrieval code path.
  3. I want to experiment with the different aggregation leading edge strategies, how can I do this?
    • You can experiment by controlling the behavior of the aggregation leading edge at the request time, which will overridden the Stream Feature View configuration. The aggregation_leading_edge parameter can be overridden at the request level as follows:

NOTE: The override functionality is scheduled for deprecation in a future release to align with our long-term goal of simplifying the system and improving cost-efficiency.

$ curl -X POST http://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"params": {
"feature_service_name": "mockdata_feature_service",
"join_key_map": {
"user_id": "user_1",
},
"requestOptions": {
"aggregationLeadingEdge" = "AGGREGATION_MODE_WALL_CLOCK_TIME" or "AGGREGATION_MODE_LATEST_EVENT_TIME"
},
"workspace_name": "prod"
}
}

Was this page helpful?