tecton.FeatureAggregation

class tecton.FeatureAggregation(column, function, time_windows)

This class describes a single aggregation that is applied in a batch or stream window aggregate feature view.

Parameters
  • column (str) – Column name of the feature we are aggregating.

  • function (Union[str, AggregationFunction]) – One of the built-in aggregation functions.

  • time_windows (Union[str, List[str]]) – Duration to aggregate over in pytimeparse format. Examples: "30days", ["8hours", "30days", "365days"].

function can be one of predefined numeric aggregation functions, namely "count", "sum", "mean", "min", "max". For these numeric aggregations, you can pass the name of it as a string. Nulls are handled like Spark SQL Function(column), e.g. SUM/MEAN/MIN/MAX of all nulls is null and COUNT of all nulls is 0.

In addition to numeric aggregations, FeatureAggregation supports “last-n” aggregations that will compute the last N distinct values for the column by timestamp. Right now only string column types are supported as inputs to this aggregation, i.e., the resulting feature value will be a list of strings. Nulls are not included in the aggregated list.

You can use it via the last_distinct() helper function like this:

from tecton.aggregation_functions import last_distinct
my_fv = BatchWindowAggregateFeatureView(
...
aggregations=[FeatureAggregation(
    column='my_column',
    function=last_distinct(15),
    time_windows=['7days'])],
...
)

Methods

__init__

Initialize self.

__init__(column, function, time_windows)

Initialize self. See help(type(self)) for accurate signature.