Skip to main content
Version: 0.9

How Tecton Minimizes Online Store Costs

Private Preview

This feature is currently in Private Preview.

This feature has the following limitations:
  • Must be enabled by Tecton Support and requires additional DynamoDB permissions.
If you would like to participate in the preview, please file a feature request.

The process of backfilling feature values to the Online Store is important for operational machine learning applications because it ensures that the most relevant and accurate data is available for feature serving. However, when developing new features, the number of feature values that need to be computed and backfilled to the Online Store can be prohibitively large.

Bulk Load Backfills to the Online Store​

Tecton uses a bulk load capability for Online Store backfills that is optimized for compute and storage, and can cost up to 100x less than Online Store backfills in other feature stores.

Tecton optimizes Online Store backfills in the following ways:

1. Consolidation of Feature Rows​

Online Store backfills typically involve computing all historical feature values needed to serve the latest feature values and writing them to the Online Store row by row. This can lead to a large number of redundant feature values being written to the Online Store.

Tecton optimizes this process by first spinning up parallel jobs that 1) compute features for intervals across the entire backfill time range and 2) stage these values to Tecton's offline store. Tecton then writes only the latest feature value for each entity across the full backfill time range to the Online Store in one shot.

This optimization can be especially impactful when each entity is associated with many records. Without bulk load, each online backfill job would write every record to the Online Store row-by-row. With bulk load, Tecton first stages all records and finally writes just the most recent record for each entity. For example, if each entity typically corresponds to 100 feature records, this optimization would lead to 100x fewer writes.

2. DynamoDB Import from S3​

Bulk load offers additional cost optimizations when using DynamoDB as an Online Store.

Instead of writing records individually, Tecton first stages backfill data in S3 and then imports all records in bulk to a new table in DynamoDB. The S3 bulk import functionality is designed for large-scale data ingestion and is significantly cheaper than writing rows one by one.

For example, when writing 1B records of 100 bytes each to DynamoDB:

  • Cost without Bulk Load: ~$1,250
  • Cost with Bulk Load: ~$14

These savings compound when developing and deploying models that leverage multiple features based on large-scale historical datasets. This also increases feature development velocity by making it much less cost-prohibitive to iterate on and materialize features.

Configure Bulk Load Backfills​

Tecton Support can enable the bulk load backfill functionality by default for all new Feature Views. Tecton recommends that customers first test this capability on individual Feature Views before setting it as the default behavior.

To do so, set Feature View configs as follows:

# This Feature View will use the new bulk load behavior
@batch_feature_view(..., options={"ONLINE_BACKFILL_LOAD_TYPE": "BULK"})
def fv():
return ...

Caveats​

  • This requires additional DynamoDB permissions. Please reach out to Support for more information.
  • This is currently only supported for Spark-based Feature Views. Coming to Rift in a future release.
  • This requires offline materialization to be enabled (offline=True).
  • This does not yet support Feature Tables.
  • A bulk load backfill can only be completed once and cannot be retried after succeeding.
  • To take advantage of bulk load backfills when using manually-triggered materialization, set manual_trigger_backfill_end_time. When this parameter is set, Tecton orchestrates the scheduling of backfill jobs.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon