Skip to main content
Version: 0.6

tecton.Aggregation

Summary​

This class describes a single aggregation that is applied in a batch or stream feature view.

Description​

The Aggregation constructor accepts a function input, which can be one of the built-in aggregation functions. For these aggregation functions, you can pass the name of it as a string. Nulls are handled like Spark SQL Function(column)- for example, sum of all nulls is null and count of all nulls is 0.

In addition to numeric aggregations, Aggregation supports the last non-distinct and distinct N aggregation that will compute the last N non-distinct and distinct values for the column by timestamp. Right now only string column is supported as input to this aggregation, i.e., the resulting feature value will be a list of strings. The order of the value in the list is ascending based on the timestamp. Nulls are not included in the aggregated list.

Example​

You can use it via the last() and last_distinct() helper function like this:

from tecton.aggregation_functions import last_distinct, last

@batch_feature_view(
...
aggregations=[
Aggregation(
column='my_column',
function=last_distinct(15),
time_window=datetime.timedelta(days=7)),
Aggregation(
column='my_column',
function=last(15),
time_window=datetime.timedelta(days=7)),
],
...
)
def my_fv(data_source):
pass

Attributes​

The attributes are the same as the __init__ method parameters. See below.

Methods​

__init__(...)​

Parameters​

  • column (str) – Column name of the feature we are aggregating.

  • function (Union[str, <aggregation function>]) – One of the built-in aggregation functions, such as count. See the time-window aggregation functions reference for a list of aggregation functions.

  • time_window (datetime.timedelta) – Duration to aggregate over. Example: datetime.timedelta(days=30).

  • name (str) – The name of this feature. Defaults to an autogenerated name, e.g. transaction_count_7d_1d.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon