Skip to main content
Version: Beta 🚧

TimeWindowSeries

Summary​

This class describes a TimeWindowSeries that is applied in an Aggregation within a Batch or Stream Feature View.

Time Window Series are useful for expressing a feature like "user transaction sum for every hour in the last week."

For an overview of Aggregation Windows, check out Aggregation Windows.

Description​

Tecton aggregations are applied over the specified time window series using the time_window parameter. Use the TimeWindowSeries class to create an aggregation over a series of time windows as shown in the example below:

from tecton import batch_feature_view, Aggregate, TimeWindowSeries
from datetime import timedelta


@batch_feature_view(
sources=[transactions],
mode="spark_sql",
entities=[user],
aggregation_interval=timedelta(days=1),
timestamp_field="timestamp",
features=[
Aggregate(
input_column=Field("value", Int32),
function="sum",
time_window=TimeWindowSeries(
series_start=timedelta(days=-7),
window_size=timedelta(days=1),
),
)
],
)
def user_transaction_sums(transactions):
return f"""
SELECT user_id, timestamp, value
FROM {transactions}
"""

Example​

Consider the following example mock data:

user_idtimestampvalue
0user_12022-05-14 00:00:001
1user_12022-05-15 00:00:003
2user_12022-05-16 00:00:006
3user_12022-05-17 00:00:0011
4user_12022-05-18 00:00:0023

During Offline Retrieval, when you pass in an events dataframe to join against, the aggregation will be computed over the time window series relative to the timestamps in the input events dataframe. We give examples of how the aggregation is computed for different timestamps in the events dataframe below.

import pandas as pd
import datetime

training_events = pd.DataFrame(
{
"user_id": ["user_1", "user_1", "user_1", "user_1", "user_1", "user_1"],
"timestamp": [
datetime(2022, 5, 15),
datetime(2022, 5, 18),
datetime(2022, 5, 19),
datetime(2022, 5, 20),
datetime(2022, 5, 24),
datetime(2022, 5, 26),
],
}
)

df = user_transaction_sums.get_features_for_events(training_events).to_pandas()
display(df)
user_idtimestampuser_transaction_sums__amt_sum_1d_1d_series_7d_0s_1d
0user_12022-05-15 00:00:00[None, None, None, None, None, None, 1]
1user_12022-05-18 00:00:00[None, None, None, 1, 3, 6, 11]
2user_12022-05-19 00:00:00[None, None, 1, 3, 6, 11, 23]
3user_12022-05-20 00:00:00[None, 1, 3, 6, 11, 23, None]
4user_12022-05-24 00:00:00[11, 23, None, None, None, None, None]
5user_12022-05-26 00:00:00[None, None, None, None, None, None, None]

Attributes​

The attributes are the same as the __init__ method parameters. See below.

Methods​

__init__(...)​

Parameters​

NameRequired?Description
series_startYesThe relative start of the series of windows, represented as a negative timedelta.
series_endNo. Defaults to 0 (i.e. "now").The relative end of the series of windows, represented as a negative timedelta.
window_sizeYesThe size of each window in the series, represented as a positive timedelta.
step_sizeNo. Defaults to window_size.The interval by which the time windows step forward in the series, represented as a positive timedelta. This is primarily useful if you want to express a series of overlapping windows. For example, if you want a series of 3 hour windows as of every hour in the last week you would set window_size=timedelta(hours=3) and step_size=timedelta(hours=1).

Was this page helpful?