Feature View ttl (time-to-live)
ttl is a parameter on Batch Feature Views, Stream Feature Views, and Feature
Tables that specifies the effective "time-to-live" for non-aggregate (i.e.
attribute-type) features. This is how long a value should be considered valid
relative to its timestamp. It is also implicitly calculated for (non-lifetime)
aggregate feature views to prevent the accumulation of expired data in the
online store. Understanding and tuning ttl behavior can be an important cost
and performance optimization in applications where serving old values is not
necessary.
Example Feature Views with and without ttl​
To better understand how the ttl parameter works, first see the following
Feature View that does not use a TTL.
@batch_feature_view(
sources=[user_sign_up_events],
entities=[user],
features=[
Attribute("user_zip_code", String),
Attribute("user_dob", Timestamp),
],
ttl=None,
timestamp_field="sign_up_date",
feature_start_time=datetime(2020, 1, 1),
)
def user_sign_up_metadata_features(user_sign_up_events):
...
This Feature View will ingest sign-up events as far back as Jan 1, 2020, and
then when queried will return the zip-code and date-of-birth from the most
recent sign-up event for a specified user. In this case, since ttl=None
sign-up events will be considered valid forever or until another later event is
ingested for that user.
Using an "infinite" TTL (i.e. ttl=None) makes sense for this use case because
a user may only sign up once and that data should be considered valid
indefinitely.
Next, consider a Feature View that uses ttl.
@stream_feature_view(
sources=[ad_impression_events],
entities=[user],
features=[
Attribute("last_seen_ad_id", Int64),
Attribute("last_seen_ad_impression_ts", Timestamp),
],
ttl=timedelta(days=2),
timestamp_field="event_ts",
feature_start_time=datetime(2024, 1, 1),
)
def last_seen_ad_features(ad_impression_events):
...
This Feature View ingests ad impression events and tracks the last seen ad and
impression timestamp for a given user. This Feature View may be used by an
ad-targeting system to avoid showing too many ads or avoid showing the same ad
twice in a row. In this case, ttl=timedelta(days=2) so the "last ad
impression" is only considered valid for two days. Two days after the most
recent impression event for a user, this Feature View will begin returning null
for this feature. Setting a short TTL like this has performance and cost
benefits for both online and offline retrieval.
Aggregate Feature Views​
The formula we use for calculating the ttl of an aggregate Feature View is
timestamp of the feature value + aggregation_interval + longest time_window + 7 days (grace period).
Consider an aggregate Feature View.
@batch_feature_view(
sources=[transactions],
entities=[merchant],
aggregation_interval=timedelta(days=1),
features=[
Aggregate(
input_column=Field("transaction_amt", String),
function="sum",
time_window=TimeWindow(window_size=timedelta(days=30)),
),
Aggregate(
input_column=Field("transaction_amt", String),
function="sum",
time_window=TimeWindow(window_size=timedelta(days=45)),
),
],
timestamp_field="transaction_time",
feature_start_time=datetime(2024, 1, 1),
)
def user_sign_up_metadata_features(user_sign_up_events):
...
In this case, a tile of data representing feature values from 2024/1/1 would have the following values:
timestamp of the feature value= 2024/1/1aggregation_interval= 1 daylongest time_window= 45 days
So this piece of data would be considered expired at
2024/1/1 + 1 day + 45 days + 7 days = 2024/2/23.
Note that ttl can't be set for aggregate Feature Views because the time
windows indicate the valid time ranges that we need to keep data around for in
the online store.
Performance and cost benefits of using ttl​
1. Write less data to the online store​
Tecton will only materialize data to the online store that may be needed for
online retrieval. In the last_seen_ad_features example above, this means that
the Feature View would only backfill the most recent two days' worth of data to
the online store. The behavior differs depending on whether you're using a
Feature View with scheduled or manual triggers, or a Feature Table:
- Feature Views with
BatchTriggerType.SCHEDULED: Tecton manages the materialization process and will only write the data required for online retrieval to the online store. - Feature Tables or Feature Views with
BatchTriggerType.MANUAL: In these cases, the user is responsible for managing ingestion. When both Online and TTL are enabled, all data will be written to the online store and later removed based on the TTL configuration. This distinction is important for managing storage costs and optimizing retrieval performance.
2. Retrieve less data during offline retrieval​
When executing offline queries (e.g. when generating training data), Tecton will attempt to minimize the amount of data read from the offline store or raw batch sources.
For example, when generating training data for the last_seen_ad_features
Feature View above, Tecton will only query for events that occurred up to two
days before the training data events. So if all of the training events occurred
on Jan 30, then Tecton would only retrieve offline data in the range [Jan 28,
Jan 30]. This query optimization is particularly impactful when querying against
the Tecton offline store, which is partitioned by the feature view's timestamp.
3. Expire data out of the online store​
Configuring the ttl for a Feature View allows Tecton to delete that data from
the online store, which reduces storage costs and can be important for data
compliance. Since storage costs are usually the primary driver for Redis, using
ttl can substantially reduce costs when using Redis as your online store.
A feature value is deleted from the online store when all of the following conditions are met:
- The feature value has expired from the online store (because the feature value's timestamp is earlier than the current time minus the TTL)
- The online store is running on Redis and the Feature View was created after August 3, 2022.
- The online store is running on Dynamo and the Feature View is a newly created
Feature View using Tecton materialization runtime version
0.9.15or higher. - For a non-aggregate feature value:
current time - feature row timestamp > ttl + 7 days. - For an aggregate feature value:
current time - timestamp of the feature value > aggregation_interval + longest time_window + 7 days.
Lower ttl values will reduce feature data storage costs.
If there is more than a 7 day gap between the current time and the last time a Feature View's values were written to the online store, some of the Feature View's values not exceeding the TTL period may be automatically deleted from the online store. In this case, these values will be null. For assistance with this situation, contact Tecton Support for assistance.
The ttl parameter has no effect on the deletion of feature values from the
offline store. To remove values from the offline store, consider the following
options:
-
.delete_keys(): Feature Views and Feature Tables have a.delete_keys()method to delete entries matching specified join key(s) from the online and offline store. -
S3 Lifecycle Management: Set up an S3 Lifecycle configuration to automatically delete S3 objects after expiration.
Details of ttl for Batch Feature Views​
Batch Feature Views do not use the wall clock time to determine when to stop serving a feature value. Instead, batch feature values are expired out only when the next incremental batch materialization jobs completes. This is in order to prevent expected or unexpected delays in the batch pipeline from degrading online serving.
For example, when a Batch Feature View has a batch_schedule=timedelta(days=1)
then feature values are materialized on a daily cadence, e.g. events from Jan 2
are actually materialized on the next daily run on Jan 3. If that Batch Feature
View has a ttl=timedelta(days=1), then Tecton will serve those feature values
until the next daily run on Jan 4 has completed. This prevents scheduling
delays, job execution times, or batch outages from degrading online serving.