Version: 1.1

Using the Aggregation Engine

Tecton's Feature Platform comes with an Aggregation Engine, which optimizes both the feature engineering experience and the production performance of aggregation features.

Aggregation features are derived by calculating metrics over a window of time, such as averages, counts, max/min values, etc.

Using Tecton's Aggregation Engine is optional. If you prefer to write your own custom ETL pipelines, see Custom batch aggregation features.

Quick Example

from datetime import datetime, timedelta
from tecton import StreamFeatureView, Entity, Aggregation, StreamSource, PushConfig
from tecton.types import String, Timestamp, Float64, Field
from tecton import TimeWindow


# Define the entity for this feature
user = Entity(name="User", join_keys=[Field("user_id", String)])

# Define a Streaming Push Source
transactions_source = StreamSource(
    name="transactions_source",
    stream_config=PushConfig(log_offline=True),
    schema=input_schema,
)

# Time window aggregation feature
transaction_aggregations = StreamFeatureView(
    name="transaction_aggregations",
    source=transactions_source,
    entities=[user],
    online=True,
    offline=True,
    feature_start_time=datetime(2018, 1, 1),
    timestamp_field="timestamp",
    features=[
        Aggregate(
            input_column=Field("transaction_amount", Float64),
            function="sum",
            time_window=TimeWindow(window_size=timedelta(days=30)),
        ),
        Aggregate(
            input_column=Field("transaction_amount", Float64),
            function="sum",
            time_window=TimeWindow(window_size=timedelta(days=180)),
        ),
        Aggregate(
            input_column=Field("transaction_amount", Float64),
            function="count",
            time_window=TimeWindow(window_size=timedelta(days=180)),
        ),
        Aggregate(
            input_column=Field("transaction_amount", Float64),
            function="last",
            time_window=TimeWindow(window_size=timedelta(days=180)),
        ),
        Aggregate(
            input_column=Field("transaction_amount", Float64),
            function="max",
            time_window=TimeWindow(window_size=timedelta(days=365 * 4)),
        ),
        Aggregate(
            input_column=Field("transaction_amount", Float64),
            function="min",
            time_window=TimeWindow(window_size=timedelta(days=365 * 4)),
        ),
    ],
)

This simple streaming example calculates:

The trailing 30d transaction amount sum
The trailing 180d transaction amount sum
The trailing 180d transaction count
The last transaction amount over the past 180 days
The max transaction amount in the past 4 years
The min transaction amount in the past 4 years

Benefits of the Aggregation Engine

If you define an aggregation feature using Tecton's Aggregate feature type, you get the following benefits:

Simplicity

Simple UX: As the example above shows, defining these online/offline-consistent, highly-performant and fresh features requires only a few lines of code.
Preprocessing Flexibility: You can aggregate raw data directly (see example above), or aggregate raw data that you filter / transform using standard Python or SQL.
Backfill Support: Streaming and Batch Features can be backfilled from batch data sources.
Data Source Agnostic: Aggregations are seamlessly supported across Batch, Stream or Push Data Sources.

Performance & Efficiency

Lifetime Window Support: You can aggregate a limited time window, or the entire life-time of data.
Low Latency Serving: Tecton optimizes the stored data to provide ultra-low latency serving, even for large time windows.
Streaming Memory Efficiency: Streaming aggregation features manage their state in Tecton's online store. As a result, they have a very limited memory footprint - even for arbitrarily large time windows. As a result, OOMs that you commonly run into with industry-standard streaming processors are a thing of the past.
Online Storage Efficiency: Tecton's aggregation engine optimizes the data stored in the online store, significantly reducing the infrastructure cost.
Minimal Backfills: Backfills are done intelligently - Tecton only writes relevant data of the recent past to the online store.

Accuracy

High Freshness: Time Window aggregations can be anchored to the "present" time, at which the feature data is requested. This allows you to fetch aggregation features that are fresh as of a few milliseconds.
Time Travel Compatible: When generating training data, Tecton's time travel capability ensures that you avoid online/offline skew.
Correctness Guarantees: Writing time window aggregations that can be executed online and offline consistently is hard and getting it wrong is easy. If you use Tecton aggregations, online/offline consistency and accuracy are guaranteed.

Quick Example​

Benefits of the Aggregation Engine​

Was this page helpful?

Quick Example

Benefits of the Aggregation Engine