The ttl
(time-to-live) Parameter
ttl
is a parameter on Batch Feature Views, Stream Feature Views, and Feature
Tables that specifies the effective "time-to-live" for non-aggregate (i.e.
attribute-type) features. This is how long a value should be considered valid
relative to its timestamp. The ttl
parameter can be an important cost and
performance optimization in applications where serving very old values is not
necessary.
Example Feature Views with and without ttl
​
To better understand how the ttl
parameter works, first see the following
Feature View that does not use a TTL.
@batch_feature_view(
sources=[user_sign_up_events],
entities=[user],
features=[
Attribute("user_zip_code", String),
Attribute("user_dob", Timestamp),
],
ttl=None,
timestamp_field="sign_up_date",
feature_start_time=datetime(2020, 1, 1),
)
def user_sign_up_metadata_features(user_sign_up_events):
...
This Feature View will ingest sign-up events as far back as Jan 1, 2020, and
then when queried will return the zip-code and date-of-birth from the most
recent sign-up event for a specified user. In this case, since ttl=None
sign-up events will be considered valid forever or until another later event is
ingested for that user.
Using an "infinite" TTL (i.e. ttl=None
) makes sense for this use case because
a user may only sign up once and that data should be considered valid
indefinitely.
Next, consider a Feature View that uses ttl
.
@stream_feature_view(
sources=[ad_impression_events],
entities=[user],
features=[
Attribute("last_seen_ad_id", Int64),
Attribute("last_seen_ad_impression_ts", Timestamp),
],
ttl=timedelta(days=2),
timestamp_field="event_ts",
feature_start_time=datetime(2024, 1, 1),
)
def last_seen_ad_features(ad_impression_events):
...
This Feature View ingests ad impression events and tracks the last seen ad and
impression timestamp for a given user. This Feature View may be used by an
ad-targeting system to avoid showing too many ads or avoid showing the same ad
twice in a row. In this case, ttl=timedelta(days=2)
so the "last ad
impression" is only considered valid for two days. Two days after the most
recent impression event for a user, this Feature View will begin returning null
for this feature. Setting a short TTL like this has performance and cost
benefits for both online and offline retrieval.
Performance and cost benefits of using ttl
​
1. Write less data to the online store​
Tecton will only materialize data to the online store that may be needed for
online retrieval. In the last_seen_ad_features
example above, that would mean
that that Feature View would only backfill the most recent two days worth of
data to the online store.
2. Retrieve less data during offline retrieval​
When executing offline queries (e.g. when generating training data), Tecton will attempt to minimize the amount of data read from the offline store or raw batch sources.
For example, when generating training data for the last_seen_ad_features
Feature View above, Tecton will only query for events that occurred up to two
days before the training data events. So if all of the training events occurred
on Jan 30, then Tecton would only retrieve offline data in the range [Jan 28,
Jan 30]. This query optimization is particularly impactful when querying against
the Tecton offline store, which is partitioned by the feature view's timestamp.
3. Expire data out of the online store​
Configuring the ttl
for a Feature View allows Tecton to delete that data from
the online store, which reduces storage costs and can be important for data
compliance. Since storage costs are usually the primary driver for Redis, using
ttl
can substantially reduce costs when using Redis as your online store.
A feature value is deleted from the online store when all of the following conditions are met:
- The feature value has expired from the online store (because the feature value's timestamp is earlier than the current time minus the TTL)
- The online store is running on Redis and the Feature View was created after August 3, 2022.
- The online store is running on Dynamo and the Feature View is a newly created
Feature View using Tecton materialization runtime version
0.9.15
or higher. - For a non-aggregate feature value:
current time - feature row timestamp > ttl + 7 days
. - For an aggregate feature value:
current time - timestamp of the feature value > aggregation_interval + longest time_window + 7 days
.
Lower ttl
values will reduce feature data storage costs.
If there is more than a 7 day gap between the current time and the last time a Feature View's values were written to the online store, some of the Feature View's values not exceeding the TTL period may be automatically deleted from the online store. In this case, these values will be null. For assistance with this situation, contact Tecton Support for assistance.
The ttl
parameter has no effect on the deletion of feature values from the
offline store. To remove values from the offline store, consider the following
options:
-
.delete_keys()
: Feature Views and Feature Tables have a.delete_keys()
method to delete entries matching specified join key(s) from the online and offline store. -
S3 Lifecycle Management: Set up an S3 Lifecycle configuration to automatically delete S3 objects after expiration.
Details of ttl for Batch Feature Views​
Batch Feature Views do not use the wall clock time to determine when to stop serving a feature value. Instead, batch feature values are expired out only when the next incremental batch materialization jobs completes. This is in order to prevent expected or unexpected delays in the batch pipeline from degrading online serving.
For example, when a Batch Feature View has a batch_schedule=timedelta(days=1)
then feature values are materialized on a daily cadence, e.g. events from Jan 2
are actually materialized on the next daily run on Jan 3. If that Batch Feature
View has a ttl=timedelta(days=1)
, then Tecton will serve those feature values
until the next daily run on Jan 4 has completed. This prevents scheduling
delays, job execution times, or batch outages from degrading online serving.