Skip to main content
Version: 0.9

Aggregation Windows

Tecton's Aggregation Engine allows you to create features as aggregations over a column in your Feature View transformation. Aggregations are specified via the aggregations parameter in the decorator of a Batch or Stream Feature View. See a quick example here:

@stream_feature_view(
source=transactions,
entities=[user],
mode="pandas",
aggregations=[
Aggregation(function="mean", column="amt", time_window=timedelta(days=1), name="1_day_avg"),
Aggregation(function="mean", column="amt", time_window=timedelta(days=3), name="3_day_avg"),
Aggregation(function="mean", column="amt", time_window=timedelta(days=7), name="7_day_avg"),
],
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amt", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amt"]]

Every Aggregation has an associated time window to aggregate over, specified via the time_window parameter. There are 3 different types of time windows which together allow for a great degree of flexibility:

  1. Time Window: A fixed window length stretching into the past from "now," with an optional offset. For example, "the last 7 days."
  2. Lifetime Window: An ever-growing window from a specific point in the past, up until "now." For example, "from Jan 1, 2020 until now."
  3. Time Window Series: A series of windows over a time range relative to "now." For example, "every day in the last week."

The following diagram illustrates some common window configurations.

Tecton Time Windows

See the sections below for in-depth explanations and usage examples.

A quick note on "now"

You might be asking, "but when exactly is 'now'?" The answer is that it depends on the context.

During online retrieval, "now" means now (i.e. the request time) because we are interested in the current feature value for inference.

However, when retrieving offline features for a historical event, "now" means "the provided timestamp of the event." Tecton handles the time travel to retrieve the correct historical value as of that time.

One last note: In Batch Feature Views or Stream Feature Views that use sliding windows, the end of the window will not truly be "now," but rather the most recent aggregation_interval.

Time Window​

The TimeWindow class is used to specify a fixed window length into the past relative to "now." For example, "the last 7 days." This is the most common window type.

TimeWindow has two parameters:

NameRequired?Description
window_sizeYesThe size of the window, expressed as a positive timedelta.
offsetNo. Defaults to 0.The relative end time of the window, expressed as a negative timedelta.

As shorthand, if you simply pass in a timedelta to the Aggregation's time_window parameter, Tecton will interpret this as a TimeWindow with no offset. For example, time_window=timedelta(days=7) is the same as time_window=TimeWindow(window_size=timedelta(days=7)).

See the SDK Reference for more details.

Time Window Example​

This example leverages the shorthand notation described above.

@stream_feature_view(
source=transactions,
entities=[user],
mode="pandas",
aggregations=[
Aggregation(function="mean", column="amt", time_window=timedelta(days=7), name="1_week_avg"),
Aggregation(
function="mean",
column="amt",
time_window=TimeWindow(window_size=timedelta(days=7), offset=timedelta(days=-3)),
"1_week_avg_3_days_ago",
),
],
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amt", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amt"]]

Lifetime Window​

Private Preview

This capability requires Data Compaction. Compaction and Lifetime Windows are in Private Preview and have limitations that will be resolved in future Tecton releases. See Limitations & Requirements for more details. This is currently available for Spark-based Feature Views -- support for Rift is coming soon.

The LifetimeWindow class is used to specify an ever-growing window from a specific point in the past, up until "now." For example, "from Jan 1, 2000 until now."

The start time of the Lifetime Window is specified via the lifetime_start_time parameter on a Batch or Stream Feature View and therefore must be the same time for all LifetimeWindows in a single Feature View.

Lifetime Windows require Data Compaction to be enabled via the compaction_enabled=True parameter on a Batch or Stream Feature View. This ensures efficient computation and retrieval.

See the SDK Reference for more details.

Lifetime Window Example​

@stream_feature_view(
source=transactions,
entities=[user],
mode="pandas",
aggregations=[
Aggregation(function="mean", column="amt", time_window=LifetimeWindow(), name="txn_avg_since_2000"),
Aggregation(function="sum", column="amt", time_window=LifetimeWindow(), name="txn_sum_since_2000"),
],
compaction_enabled=True,
lifetime_start_time=datetime(2000, 1, 1),
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amt", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amt"]]

Time Window Series​

warning

This feature is currently available for Spark-based features, and available on Rift when setting tecton.conf.set('DUCKDB_ENABLE_OPTIMIZED_FULL_AGG', False).

The TimeWindowSeries class is used to specify a series of time windows over a time range relative to "now." For example, "every hour in the last week."

The output type of a Time Window Series feature is an array of values representing an aggregate for each window in the series ordered from earliest to latest.

TimeWindowSeries has 4 parameters:

NameRequired?Description
series_startYesThe relative start of the series of windows, represented as a negative timedelta.
series_endNo. Defaults to 0 (i.e. "now").The relative end of the series of windows, represented as a negative timedelta.
window_sizeYesThe size of each window in the series, represented as a positive timedelta.
step_sizeNo. Defaults to window_size.The interval by which the time windows step forward in the series, represented as a positive timedelta. This is primarily useful if you want to express a series of overlapping windows. For example, if you want a series of 3 hour windows as of every hour in the last week you would set window_size=timedelta(hours=3) and step_size=timedelta(hours=1).
note

The start and end of the series are aligned to the start of the first window and the end of the last window.

Tecton will validate your configuration to ensure this alignment is possible and give an error if not.

For example, a 3 day series, with a 2 day window size would be invalid because you can not fit sequential non-overlapping 2 day windows into a 3 day range. This configuration would be valid with either a 1 day step size, or a 4 day series. See this diagram for a visual:

Valid Time Window Series

See the SDK Reference for more details.

Time Window Series Example​

The output data for this feature would be an array of 168 floats representing the transaction average for each hour in the past week, starting from the earliest hour.

@stream_feature_view(
source=transactions,
entities=[user],
mode="pandas",
aggregations=[
Aggregation(
function="mean",
column="amt",
time_window=TimeWindowSeries(series_start=timedelta(days=-7), window_size=timedelta(hours=1)),
name="hourly_txn_avg_last_7d",
),
],
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amt", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amt"]]

Was this page helpful?