Skip to content

Deleting Keys from a Feature View

Tecton makes it simple to delete individual keys from a materialized Feature View. This capability can be helpful for cleaning up erroneous data or handling user data deletion requests.

Key deletion is available in Tecton >= 0.3. See the Notebook SDK upgrade guide for details on how to get the latest version.

Requirements and Limitations

Feature View Requirements

In order to be eligible for key deletion, a Feature View needs to meet the following requirements:

  • Needs to materialize data with either online=True or offline=True. Otherwise there is no data to delete! Note that OnDemandFeatureView does not materialize data to Tecton, so there is similarly no data to delete.
  • The offline store needs to be configured to use Delta format. To do so, set offline_store=DeltaConfig().
  • The online store needs to be DynamoDB, which is the default for Feature Views.
  • Cannot have online_serving_index configured.

Permissions Requirements

The Spark Policy configured for Tecton needs to have permission for the BatchWriteItem. This requirement was added to our terraform sample on Jan 24, 2022. You may need to add this permission if your Tecton deployment was created before this date.

Deletion Request Limitations

When constructing your dataframe of IDs to delete:

  • Maximum 10,000 keys can be deleted per request.
  • If a Feature View has multiple entities, the full set of join keys must be specified. For example, if the Feature View has entities [user_id, merchant_id], then both IDs must be present for each row in the deletion request.

Finally, note that Tecton does not prevent materializing data for these IDs in the future, including late-arriving data or concurrently running materialization jobs.

Using the delete_keys method

The delete_keys() SDK method is available for BatchFeatureView, StreamFeatureView, and FeatureTable. This method needs to be run from your Databricks or EMR environment.

See the SDK reference for the full method signature.

First, construct your Spark or Pandas Dataframe with the set of keys to be deleted. For example:

join_keys_df = pandas.DataFrame({
  'user_id': ['A100000000', 'C200000000']
})

Then call the delete_keys() method on Feature View or Feature table:

fv = tecton.get_feature_view('my_feature_view')
fv.delete_keys(join_keys_df)

This method will trigger asynchronous jobs and return. To view the status of these jobs, look for jobs with DELETION type in FeatureView.deletion_status(). These jobs typically take 10 to 45 minutes to complete, depending on the size of your data.

Deletion Materialization Status Command

You can also view status under the materialization tab for this feature view in your Tecton web console. Note that the deletion jobs will be at the end, after all the materialization jobs.

Deletion Materialization Tab Screenshot

Finally, you may want to verify that the data was deleted as intended.

keys = { 'user_id' : 'A100000000'}
fv.get_online_features(join_keys=keys).to_dict()

keys_df = pandas.DataFrame({
  'user_id': ['A100000000']
})

fv.get_historical_features(entities=keys_df).to_pandas()

How it works

When you run FeatureView.delete_keys(join_keys_dataframe), Tecton will initiate jobs to delete entries from both the online and offline store that match the specified join keys.

Deletion requires that you use Delta formatting for the offline store. Tecton will use Delta table APIs to delete all historical values for the specified join keys. Because deletion with Delta only removes data from the latest version of the table, Tecton will additionally run a vacuum command to fully remove any data deleted at least 7 days ago. As a result, all data should be fully deleted within 7 days, so long as you are continuously running deletion jobs.

Additionally, Tecton will delete the associated feature values from the online store. Currently, DynamoDB is the only online store with Tecton that supports deletion. Note that there will be costs associated with these DynamoDB operations.