Version: 1.1

Cache Features for Real-time Inference

Private Preview

This feature is currently in Private Preview.

This feature has the following limitations:

Must be enabled by Tecton support.

If you would like to participate in the preview, please file a support ticket.

Tecton Feature Serving Cache reduces both cost and latency of real-time inference for high-scale use-cases. The Tecton Feature Serving Cache will be managed by Tecton and can be configured by users.

Which features should use the Cache?

High Traffic, Low Cardinality Key Reads: Ideal for use cases with high traffic where the same keys are repeatedly read, thereby reducing the cost and response times for a request.
Large Aggregation Intervals: Suitable for features with large aggregation intervals that require extended computation times.

Which features should NOT use the Cache?

Low Duplication Traffic: The cache should not be used if the inbound feature server traffic does not have high duplication.
Low Tolerance for Staleness: The cache should not be used if the features requested have a low tolerance for staleness (i.e less than a maximum staleness of 60 seconds). An example are Stream Feature Views that use continuous mode streaming.
Realtime Feature Views: Caching is not supported for Realtime Feature Views.

Cache Data Model

The cache is managed by Tecton, with no data being persisted or being cached for more than 24 hours.

Tecton caches the data at the entity key level so multiple Feature Views sharing the same entity keys will be cached under the same cache key. Cached values will be isolated at the workspace level providing the following advantages:

Fewer number of primary keys in the cache increasing performance.
Retrieving different Feature Views with the same entity join keys will be more performant.
Cached values from Feature Views are shared across different Feature Services with the same entity keys and will fetch/store the same values.

Note: Customers should try to reduce the ratio of join key combinations to Feature Views to maximize performance.

Using the Cache

You can enable caching on a Feature Service by adding a flag to the Feature View or Feature Table and a Feature Service as shown in the snippet below:

Note: These options only take effect if you have enabled caching for your account by talking to Tecton support.

# Arguments are case-sensitive and accept only strings as input. The maximum value is one day, and the minimum value is 60 seconds.
from tecton import (
    CacheConfig,
    batch_feature_view,
    FeatureService,
    FeatureTable,
    stream_feature_view,
)

cache_config = CacheConfig(max_age_seconds=3600)


@batch_feature_view(
    ...,
    cache_config=cache_config,
    ...,
)
def my_cached_batch_feature_view():
    return


@stream_feature_view(
    ...,
    cache_config=cache_config,
    ...,
)
def my_cached_stream_feature_view():
    return


my_cached_feature_table = FeatureTable(
    ...,
    cache_config=cache_config,
    ...,
)

fs = FeatureService(
    feature_views=[
        my_cached_batch_feature_view,
        my_cached_stream_feature_view,
        my_cached_feature_table,
        ...,
    ],
    name="cached_feature_service",
    online_serving_enabled=True,
    enable_online_caching=True,
)

The max_age_seconds parameter in the CacheConfig determines the maximum number of seconds a feature will be cached before it becomes stale. This value must be between 60s and 1 day inclusive.
- Increasing the max age will increase your overall cache hit rate but it will also mean that your data will remain in the cache for longer.
The enable_online_caching parameter determines whether the Feature Service will attempt to retrieve a cached value from cached Feature Views. If a Feature View with cache options set is part of a Feature Service with caching disabled, then that Feature View will not retrieve cached values.
You can verify that a value is being pulled from the cache by adding the include_serving_status=true metadata option in your request to the feature server. See metadata options
The server response metadata will include a status field that indicates whether the value was retrieved from the cache or not.

// First Request Response
{
  "metadata": {
    "features": [
        "my_feature_view.feature": {
          "status": "PRESENT"
        },
    ...]
  }
}
// Second Request Response

{
  "metadata": {
    "features": [
        "my_feature_view.feature": {
          "status": "CACHED_PRESENT"
        },
    ...]
  }
}

Skipping the cache

You can add request options to skip the cache entirely. There are two options in your feature request to control skipping the write and/or read operations to the cache.

$ curl -X POST https://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
     -H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
  "params": {
    "workspace_name": "prod",
    "feature_service_name": "cache_service",
    ...
    "requestOptions": {
        "readFromCache": False,
        "writeToCache": False,
    }
  }
}'

Options

readFromCache: Defaults to True. If set to False, the feature server will not read from the cache and will instead recompute the feature value. This does not affect whether or not the feature server will write to your cache.
writeToCache: Defaults to True. If set to False, the feature server will not write to the cache after computing a feature value. This does not affect whether the feature server will read an already cached value for the corresponding request.

How does it work?

When the feature server receives a request, it first checks the cache for the requested features.
If any of the requested features are not in the cache, the feature server will query the underlying online store for the missing features.
The feature server will then store the retrieved features in the cache and return the requested features to the client.
Subsequent requests will then attempt to retrieve the cached value.
All features in a feature view are cached together so if one feature returns a CACHED_<*> status, all of the features associated with that feature view should also be CACHED_<*>.

Cached Statuses

CACHED_PRESENT: The feature is retrieved from the cache and the feature that was cached had a PRESENT status.
CACHED_UNKNOWN: The feature is retrieved from the cache but there was an error in caching the status of the feature.
CACHED_MISSING_DATA: The feature is retrieved from the cache and the feature that was cached had a MISSING_DATA status.

Limitations

The maximum size of cached data allowed will be 100GB per Tecton Account with a cap of 100,000 QPS.
- Contact Tecton support for guidance on cache sizings and workload requirements.
The online_serving_index parameter is not supported for cached Feature Views.
The results of a Realtime Feature View are not cached but the dependent Feature Views that they rely on can be, so Realtime Feature View features will currently always return a PRESENT status.
- The metadata status of a Realtime Feature View will not reflect that its underlying Feature View data has been cached.
- You can instead verify that dependent FVs are cached by retrieving SLO metadata and checking the storeResponseSizeBytes. In this case, if the underlying Feature View is cached, you should see the storeResponseSizeBytes decrease.
- This functionality may be extended at a later time.
Effective times will be omitted from the response if the feature is cached.
Different subsets of a feature view are cached separately. For example, if a feature view fv contains features f1, f2, and f3 and 1 feature service uses fv[f1, f2] while the other uses fv[f2, f3] then each of these feature views will be cached separately.
A Realtime Feature View that uses Feature View as inputs will always implicitly use all of the features of that feature view, even if the Feature View is using subsets of those inputs.

Running in Production

See Caching in Production for guidelines on running in production and ballpark numbers on cost savings.

Which features should use the Cache?​

Which features should NOT use the Cache?​

Cache Data Model​

Using the Cache​

Skipping the cache​

Options​

How does it work?​