Manually Triggering Materialization
By default, feature materialization runs automatically, on a schedule managed by Tecton. For example, you can configure your Feature View materialization to run every hour or every day.
By contrast, manual materialization does not run automatically. Instead, a manual materialization job is triggered using an API call.
Batch Feature Views and Stream Feature Views support manual materialization.
Manual materialization can be triggered through the following methods:
- The Tecton SDK
- The Tecton Airflow provider
This page explains how to use the Tecton SDK to manually trigger materialization.
To use the Tecton Airflow provider instead, see the readme file in the provider repo.
Disabling scheduled batch jobs for a feature view
batch_trigger=BatchTriggerType.MANUAL is set in the feature view, Tecton
will not schedule any batch materialization jobs for the Feature View. Batch
materialization will only be possible through manually triggering jobs with the
Tecton SDK or Tecton Airflow Provider.
aggregation_interval must still be defined for
materialization, as Tecton partitions data based on this interval.
only batch materialization job scheduling will be impacted by the
batch_trigger setting. Streaming materialization job scheduling will still be
managed by Tecton.
If a Data Source input to the Feature View has
data_delay set, then that delay
will still be factored in to constructing training data sets but does not impact
when the job can be triggered with the materialization API.
Example of a feature view configured for manual materialization
from tecton import batch_feature_view, FilteredSource, Aggregation, BatchTriggerType
from fraud.entities import user
from fraud.data_sources.transactions import transactions_batch
from datetime import datetime, timedelta
Aggregation(column="transaction", function="count", time_window=timedelta(days=1)),
Aggregation(column="transaction", function="count", time_window=timedelta(days=30)),
Aggregation(column="transaction", function="count", time_window=timedelta(days=90)),
feature_start_time=datetime(2022, 5, 1),
description="User transaction totals over a series of time windows, updated daily.",
batch_trigger=BatchTriggerType.MANUAL, # Use manual triggers
1 as transaction,
Tecton SDK methods for triggering, monitoring and canceling materialization
In the Tecton SDK, the Feature View interactive classes have methods for triggering, monitoring and canceling materialization jobs. See the BatchFeatureView and StreamFeatureView interactive SDK reference for method details.
You can still use these methods for feature views with scheduled materialization enabled.
Triggering a new materialization job
trigger_materialization_job() method allows you to initiate a job to
materialize feature values for the specified time range. This method returns a
job identifier that we’ll reference in later steps.
To backfill a newly created feature, you can use this command as a one-off to backfill data from the feature start time to current time. Note that you may want to break up particularly large backfills into multiple jobs.
During regular operations, you will likely want to set up an automated process that materializes the most recent time period once the upstream data for that period is available.
The materialization window between
end_time must be evenly
divisible by the partition interval that's defined by the
start_time must align with the partition
interval. That is,
start_time % <partition interval> must equal
Here's an example of using the materialization API to trigger a batch job:
from datetime import datetime
fv = tecton.get_workspace("dev").get_feature_view("user_transaction_counts")
job_id = fv.trigger_materialization_job(
start_time=datetime(2022, 5, 1),
end_time=datetime(2022, 5, 2),
Waiting for job completion
After triggering a new job, you may want to monitor the job status to start a downstream process once complete.
To block your process until the job completes, use the
wait_for_materialization_job() method. Materialization jobs can take anywhere
from minutes to hours depending on the amount of data processed.
Alternatively, you can poll for completion status using the
get_materialization_job() method. This returns the
class with details about the job status. The job has completed successfully if
Canceling a materialization job
You can cancel a running materialization job using the
cancel_materialization_job(job_id) method. Once cancelled, a job will not be
retried further. Job run state will be set to
Note that cancellation is asynchronous, so it may take up to several minutes for
the cancellation to complete where the job run state will be set to
Re-running previously materialized periods
If you use the
overwrite=True option, then Tecton will allow the new job to
run and overwrite previously materialized data.
When using the
overwrite=True option, it’s possible to produce incorrect
results in the Online store if you have previously materialized data. Please
consult with Tecton Support before proceeding.
By default, the
trigger_materialization_job() method will return an error if
the time period specified overlaps with the time period from a previously
successful materialization job.
This operation is generally safe if:
- Your previous job completed and did not output any feature data.
- Your Feature View is only materialized offline. (The Feature View is