Version: 0.9

Aggregation

Summary

This class describes a single Aggregation that is applied to a Batch or Stream Feature View.

Description

The Aggregation constructor accepts a function input, which can be one of the built-in aggregation functions. For these aggregation functions, you can pass the name of it as a string. Nulls are handled like Spark SQL Function(column)- for example, sum of all nulls is null and count of all nulls is 0.

In addition to numeric aggregations, Aggregation supports the last non-distinct and distinct N aggregation that will compute the last N non-distinct and distinct values for the column by timestamp. Right now only string column is supported as input to this aggregation, i.e., the resulting feature value will be a list of strings. The order of the value in the list is ascending based on the timestamp. Nulls are not included in the aggregated list.

Example

You can use it via the last() and last_distinct() helper function like this:

from tecton.aggregation_functions import last_distinct, last, TimeWindow

@batch_feature_view(
...
aggregations=[
    Aggregation(
        column='my_column',
        function=last_distinct(15),
        time_window=TimeWindow(window_size=datetime.timedelta(days=7))),
    Aggregation(
        column='my_column',
        function=last(15),
        time_window=TimeWindow(window_size=datetime.timedelta(days=7))),
    ],
...
)
def my_fv(data_source):
    pass

Attributes

The attributes are the same as the __init__ method parameters. See below.

Methods

init(...)

Parameters

column (str) – Column name of the feature we are aggregating.
function (Union[str, <aggregation function>]) – One of the built-in aggregation functions, such as count. See the time-window aggregation functions reference for a list of aggregation functions.
time_window (TimeWindow) – The window_size and optional offset over which to aggregate over. See Time Window Reference for more details on the TimeWindow class.
name (str) – The name of this feature. Defaults to an autogenerated name, e.g. transaction_count_7d_1d.

Aggregation

Summary​

Description​

Example​

Attributes​

Methods​

__init__(...)​

Parameters​

Was this page helpful?