Skip to main content
Version: Beta 🚧

Online Compaction: Usage Guide

Private Preview

This feature is currently in Private Preview.

This feature has the following limitations:
  • Must be enabled by Tecton Support.
  • Available for Spark-based Feature Views -- coming to Rift in a future release.
  • See additional limitations & requirements below.
If you would like to participate in the preview, please file a feature request.

Please see the Online Compaction: Overview for a conceptual overview of Online Compaction.

Enable Online Compaction for a Batch Feature View​

  1. Set compaction_enabled=True on your Batch Feature View. This will enable Tecton to schedule compaction jobs that will compact offline/batch features in compacted tiles on a scheduled interval and materialize them to the online store.

NOTE: Your tecton_materialization_runtime must be 0.8.2 or higher.

from tecton import batch_feature_view, Attribute
from tecton.types import Int64
from datetime import timedelta, datetime


@batch_feature_view(
sources=[transactions],
mode="spark_sql",
entities=[user],
feature_start_time=datetime(2022, 5, 1),
batch_schedule=timedelta(days=1),
online=True,
offline=True,
compaction_enabled=True,
tecton_materialization_runtime="1.0.0",
timestamp_field="timestamp",
features=[Attribute(name="amount", dtype=Int64)],
)
def user_average_transaction_amount(transactions):
return f"SELECT user_id, timestamp, amount FROM {transactions}"

Enable Online Compaction for a Stream Feature View​

  1. Set compaction_enabled=True on your Stream Feature View. This will enable Tecton to schedule compaction jobs that will compact offline/batch features in compacted tiles on a scheduled interval and materialize them to the online store.
  2. Optionally set stream_tiling_enabled (defaults to False). See Stream Tiling section for the implications

NOTE: Stream compacted feature views must use tecton_materialization_runtime=1.0.0 or higher.

from tecton import stream_feature_view, FilteredSource, Aggregate, LifetimeWindow
from tecton.types import Field, Bool
from datetime import timedelta, datetime


@stream_feature_view(
source=FilteredSource(stream),
entities=[user],
mode="pyspark",
online=True,
offline=True,
timestamp_field="timestamp",
features=[
Aggregate(input_column=Field("clicked", Bool), function="count", time_window=LifetimeWindow()),
Aggregate(
input_column=Field("amount", Bool), function="sum", time_window=TimeWindow(window_size=timedelta(days=7))
),
],
feature_start_time=datetime(2024, 3, 1),
lifetime_start_time=datetime(2024, 2, 1),
batch_schedule=timedelta(days=1),
compaction_enabled=True,
tecton_materialization_runtime="1.0.0",
)
def user_click_counts(ad_impressions):
return ad_impressions.select(ad_impressions["user_uuid"].alias("user_id"), "clicked", "timestamp")

Stream Tiling​

Stream Tiling can be enabled on Stream Feature Views by setting stream_tiling_enabled parameter.

Stream tiling is recommended for use cases with hot keys, i.e. keys that may receive thousands of events per day. Stream tiling can substantially reduce online write, read, and storage costs for these use cases. However, stream tiling will slightly reduce data freshness due to micro-batching, so it is not recommended for use cases that would not benefit from streaming compaction.

Stream Tile Size

Tecton automatically determines the size of the stream tile interval based on the smallest aggregation window across all columns in the Feature View.

Smallest Aggregation WindowStream Tile Size
(0, 1h)1m
[1h, 10h)5m
[10h, Lifetime)1h

For example, if a Stream Feature View has a 30-minute aggregation of column foo and a 12-hour aggregation of column bar, then the Stream Feature View will use 1-minute stream tiles for both foo and bar.

Stream Feature View: Sawtooth Window Fuzziness​

Online compaction uses Sawtooth Windows to achieve excellent performance and freshness for Stream Feature Views at the cost of some window "fuzziness".

Tecton determines the window fuzziness based on the intervals below. Window fuzziness is always less than or equal to 10% of the window size.

Sawtooth Window Fuzziness by Window Size

Aggregation Window SizeStream Tiling EnabledFuzziness
(0, 2d)TrueStream Tile Size
(0, 2d)FalseNone
[2d, 10d]True or False1h
(10d, Lifetime)True or False1d
LifetimeTrue or FalseNone

For example, if you have a Stream Feature View with stream tiling disabled with 1-day, 7-day, and 30-day window aggregations, then the 1-day aggregation will not have any fuzziness, the 7-day aggregation window will vary between 7d and 7d+1h, and the 30-day aggregation window will vary between 30d and 31d depending on how far the stream has progressed.

If stream tiling was enabled for that feature view, then the stream tile size would be 1h (see above), and the 1-day window would vary between 1d and 1d+1h. The larger windows' fuzziness would not be affected by stream tiling.

Performance Benefits of Compaction​

More detailed benchmarking is still in progress and will come soon. However, here are some preliminary benchmarking results.

This is a basic benchmark testing low QPS load on a DynamoDB-backed Stream Feature View with Sum aggregations of 2 different window sizes.

Agg SizeLatency ReductionRead Size Reduction
100d~80%99%+
300d~85%99%+

Enabling Online Compaction for Existing Feature Views​

Please visit Upgrading Existing Feature Views

Limitations​

  1. Only available for Feature Views using DynamoDB.
  2. Compaction for Rift and Ingest API is coming soon.
  3. Currently, doesn't support approximate count distinct and approximate percentile for Stream Feature Views with time window aggregates, but these are coming soon.
  4. Support for TimeWindowSeries is coming soon for Batch and Stream Feature Views.
  5. Support for Offset Windows is support for Batch Feature Views but is coming soon for Stream Feature Views.

Was this page helpful?