Skip to main content
Version: 1.0

The ttl (time-to-live) Parameter

ttl is a parameter on Batch Feature Views, Stream Feature Views, and Feature Tables that specifies the effective "time-to-live" for non-aggregate (i.e. attribute-type) features. This is how long a value should be considered valid relative to its timestamp. The ttl parameter can be an important cost and performance optimization in applications where serving very old values is not necessary.

Example Feature Views with and without ttl​

To better understand how the ttl parameter works, first see the following Feature View that does not use a TTL.

@batch_feature_view(
sources=[user_sign_up_events],
entities=[user],
features=[
Attribute("user_zip_code", String),
Attribute("user_dob", Timestamp),
],
ttl=None,
timestamp_field="sign_up_date",
feature_start_time=datetime(2020, 1, 1),
)
def user_sign_up_metadata_features(user_sign_up_events):
...

This Feature View will ingest sign-up events as far back as Jan 1, 2020, and then when queried will return the zip-code and date-of-birth from the most recent sign-up event for a specified user. In this case, since ttl=None sign-up events will be considered valid forever or until another later event is ingested for that user.

Using an "infinite" TTL (i.e. ttl=None) makes sense for this use case because a user may only sign up once and that data should be considered valid indefinitely.

Next, consider a Feature View that uses ttl.

@stream_feature_view(
sources=[ad_impression_events],
entities=[user],
features=[
Attribute("last_seen_ad_id", Int64),
Attribute("last_seen_ad_impression_ts", Timestamp),
],
ttl=timedelta(days=2),
timestamp_field="event_ts",
feature_start_time=datetime(2024, 1, 1),
)
def last_seen_ad_features(ad_impression_events):
...

This Feature View ingests ad impression events and tracks the last seen ad and impression timestamp for a given user. This Feature View may be used by an ad-targeting system to avoid showing too many ads or avoid showing the same ad twice in a row. In this case, ttl=timedelta(days=2) so the "last ad impression" is only considered valid for two days. Two days after the most recent impression event for a user, this Feature View will begin returning null for this feature. Setting a short TTL like this has performance and cost benefits for both online and offline retrieval.

Performance and cost benefits of using ttl​

1. Write less data to the online store​

Tecton will only materialize data to the online store that may be needed for online retrieval. In the last_seen_ad_features example above, that would mean that that Feature View would only backfill the most recent two days worth of data to the online store.

2. Retrieve less data during offline retrieval​

When executing offline queries (e.g. when generating training data), Tecton will attempt to minimize the amount of data read from the offline store or raw batch sources.

For example, when generating training data for the last_seen_ad_features Feature View above, Tecton will only query for events that occurred up to two days before the training data events. So if all of the training events occurred on Jan 30, then Tecton would only retrieve offline data in the range [Jan 28, Jan 30]. This query optimization is particularly impactful when querying against the Tecton offline store, which is partitioned by the feature view's timestamp.

3. Expire data out of the online store​

Configuring the ttl for a Feature View allows Tecton to delete that data from the online store, which reduces storage costs and can be important for data compliance. Since storage costs are usually the primary driver for Redis, using ttl can substantially reduce costs when using Redis as your online store.

A feature value is deleted from the online store when all of the following conditions are met:

  • The feature value has expired from the online store (because the feature value's timestamp is earlier than the current time minus the TTL)
  • The online store is running on Redis and the Feature View was created after August 3, 2022.
  • The online store is running on Dynamo and the Feature View is a newly created Feature View using Tecton materialization runtime version 0.9.15 or higher.
  • For a non-aggregate feature value: current time - feature row timestamp > ttl + 7 days.
  • For an aggregate feature value: current time - timestamp of the feature value > aggregation_interval + longest time_window + 7 days.
note

Lower ttl values will reduce feature data storage costs.

If there is more than a 7 day gap between the current time and the last time a Feature View's values were written to the online store, some of the Feature View's values not exceeding the TTL period may be automatically deleted from the online store. In this case, these values will be null. For assistance with this situation, contact Tecton Support for assistance.

note

The ttl parameter has no effect on the deletion of feature values from the offline store. To remove values from the offline store, consider the following options:

  1. .delete_keys(): Feature Views and Feature Tables have a .delete_keys() method to delete entries matching specified join key(s) from the online and offline store.

  2. S3 Lifecycle Management: Set up an S3 Lifecycle configuration to automatically delete S3 objects after expiration.

Details of ttl for Batch Feature Views​

Batch Feature Views do not use the wall clock time to determine when to stop serving a feature value. Instead, batch feature values are expired out only when the next incremental batch materialization jobs completes. This is in order to prevent expected or unexpected delays in the batch pipeline from degrading online serving.

For example, when a Batch Feature View has a batch_schedule=timedelta(days=1) then feature values are materialized on a daily cadence, e.g. events from Jan 2 are actually materialized on the next daily run on Jan 3. If that Batch Feature View has a ttl=timedelta(days=1), then Tecton will serve those feature values until the next daily run on Jan 4 has completed. This prevents scheduling delays, job execution times, or batch outages from degrading online serving.

Was this page helpful?