Skip to main content
Version: 1.1

Feature View ttl (time-to-live)

ttl is a parameter on Batch Feature Views, Stream Feature Views, and Feature Tables that specifies the effective "time-to-live" for non-aggregate (i.e. attribute-type) features. This is how long a value should be considered valid relative to its timestamp. It is also implicitly calculated for (non-lifetime) aggregate feature views to prevent the accumulation of expired data in the online store. Understanding and tuning ttl behavior can be an important cost and performance optimization in applications where serving old values is not necessary.

Example Feature Views with and without ttl​

To better understand how the ttl parameter works, first see the following Feature View that does not use a TTL.

@batch_feature_view(
sources=[user_sign_up_events],
entities=[user],
features=[
Attribute("user_zip_code", String),
Attribute("user_dob", Timestamp),
],
ttl=None,
timestamp_field="sign_up_date",
feature_start_time=datetime(2020, 1, 1),
)
def user_sign_up_metadata_features(user_sign_up_events):
...

This Feature View will ingest sign-up events as far back as Jan 1, 2020, and then when queried will return the zip-code and date-of-birth from the most recent sign-up event for a specified user. In this case, since ttl=None sign-up events will be considered valid forever or until another later event is ingested for that user.

Using an "infinite" TTL (i.e. ttl=None) makes sense for this use case because a user may only sign up once and that data should be considered valid indefinitely.

Next, consider a Feature View that uses ttl.

@stream_feature_view(
sources=[ad_impression_events],
entities=[user],
features=[
Attribute("last_seen_ad_id", Int64),
Attribute("last_seen_ad_impression_ts", Timestamp),
],
ttl=timedelta(days=2),
timestamp_field="event_ts",
feature_start_time=datetime(2024, 1, 1),
)
def last_seen_ad_features(ad_impression_events):
...

This Feature View ingests ad impression events and tracks the last seen ad and impression timestamp for a given user. This Feature View may be used by an ad-targeting system to avoid showing too many ads or avoid showing the same ad twice in a row. In this case, ttl=timedelta(days=2) so the "last ad impression" is only considered valid for two days. Two days after the most recent impression event for a user, this Feature View will begin returning null for this feature. Setting a short TTL like this has performance and cost benefits for both online and offline retrieval.

Aggregate Feature Views​

The formula we use for calculating the ttl of an aggregate Feature View is timestamp of the feature value + aggregation_interval + longest time_window + 7 days (grace period). Consider an aggregate Feature View.

@batch_feature_view(
sources=[transactions],
entities=[merchant],
aggregation_interval=timedelta(days=1),
features=[
Aggregate(
input_column=Field("transaction_amt", String),
function="sum",
time_window=TimeWindow(window_size=timedelta(days=30)),
),
Aggregate(
input_column=Field("transaction_amt", String),
function="sum",
time_window=TimeWindow(window_size=timedelta(days=45)),
),
],
timestamp_field="transaction_time",
feature_start_time=datetime(2024, 1, 1),
)
def user_sign_up_metadata_features(user_sign_up_events):
...

In this case, a tile of data representing feature values from 2024/1/1 would have the following values:

  • timestamp of the feature value = 2024/1/1
  • aggregation_interval = 1 day
  • longest time_window = 45 days

So this piece of data would be considered expired at 2024/1/1 + 1 day + 45 days + 7 days = 2024/2/23.

Note that ttl can't be set for aggregate Feature Views because the time windows indicate the valid time ranges that we need to keep data around for in the online store.

Performance and cost benefits of using ttl​

1. Write less data to the online store​

Tecton will only materialize data to the online store that may be needed for online retrieval. In the last_seen_ad_features example above, this means that the Feature View would only backfill the most recent two days' worth of data to the online store. The behavior differs depending on whether you're using a Feature View with scheduled or manual triggers, or a Feature Table:

  • Feature Views with BatchTriggerType.SCHEDULED: Tecton manages the materialization process and will only write the data required for online retrieval to the online store.
  • Feature Tables or Feature Views with BatchTriggerType.MANUAL: In these cases, the user is responsible for managing ingestion. When both Online and TTL are enabled, all data will be written to the online store and later removed based on the TTL configuration. This distinction is important for managing storage costs and optimizing retrieval performance.

2. Retrieve less data during offline retrieval​

When executing offline queries (e.g. when generating training data), Tecton will attempt to minimize the amount of data read from the offline store or raw batch sources.

For example, when generating training data for the last_seen_ad_features Feature View above, Tecton will only query for events that occurred up to two days before the training data events. So if all of the training events occurred on Jan 30, then Tecton would only retrieve offline data in the range [Jan 28, Jan 30]. This query optimization is particularly impactful when querying against the Tecton offline store, which is partitioned by the feature view's timestamp.

3. Expire data out of the online store​

Configuring the ttl for a Feature View allows Tecton to delete that data from the online store, which reduces storage costs and can be important for data compliance. Since storage costs are usually the primary driver for Redis, using ttl can substantially reduce costs when using Redis as your online store.

A feature value is deleted from the online store when all of the following conditions are met:

  • The feature value has expired from the online store (because the feature value's timestamp is earlier than the current time minus the TTL)
  • The online store is running on Redis and the Feature View was created after August 3, 2022.
  • The online store is running on Dynamo and the Feature View is a newly created Feature View using Tecton materialization runtime version 0.9.15 or higher.
  • For a non-aggregate feature value: current time - feature row timestamp > ttl + 7 days.
  • For an aggregate feature value: current time - timestamp of the feature value > aggregation_interval + longest time_window + 7 days.
note

Lower ttl values will reduce feature data storage costs.

If there is more than a 7 day gap between the current time and the last time a Feature View's values were written to the online store, some of the Feature View's values not exceeding the TTL period may be automatically deleted from the online store. In this case, these values will be null. For assistance with this situation, contact Tecton Support for assistance.

note

The ttl parameter has no effect on the deletion of feature values from the offline store. To remove values from the offline store, consider the following options:

  1. .delete_keys(): Feature Views and Feature Tables have a .delete_keys() method to delete entries matching specified join key(s) from the online and offline store.

  2. S3 Lifecycle Management: Set up an S3 Lifecycle configuration to automatically delete S3 objects after expiration.

Details of ttl for Batch Feature Views​

Batch Feature Views do not use the wall clock time to determine when to stop serving a feature value. Instead, batch feature values are expired out only when the next incremental batch materialization jobs completes. This is in order to prevent expected or unexpected delays in the batch pipeline from degrading online serving.

For example, when a Batch Feature View has a batch_schedule=timedelta(days=1) then feature values are materialized on a daily cadence, e.g. events from Jan 2 are actually materialized on the next daily run on Jan 3. If that Batch Feature View has a ttl=timedelta(days=1), then Tecton will serve those feature values until the next daily run on Jan 4 has completed. This prevents scheduling delays, job execution times, or batch outages from degrading online serving.

Was this page helpful?