Materialization
Overviewโ
Materialization processes are run by Tecton to keep production features up-to-date. Monitoring these materialization processes helps ensure that feature pipelines are continuously delivering data to your models.
Tecton offers various tools to facilitate materialization monitoring, including dashboards in the Web UI, email alerts, and the Metrics API.
Where to View Job Statusesโ
To view the status of materialization jobs:
- Navigate to the Feature View details page in the Tecton Web UI
- Select the "Materialization" tab to see all jobs for that Feature View
- For currently running jobs, follow the details link from the jobs table to see detailed job information from your compute provider
Job Statesโ
| State | Description |
|---|---|
| RUNNING | The task is running. New attempts may be created if errors are encountered. |
| DRAINING | The task is draining. Any running attempt will be cancelled. No new attempts will be scheduled. |
| MANUAL_RETRY | The terminated task was manually requested to be re-executed. Retry policy is reset. |
| MANUAL_CANCELLATION_REQUESTED | Task cancellation is requested by a user. No new attempts will be scheduled. |
| FAILURE | The task failed permanently. No new attempts will be made. |
| MANUALLY_CANCELLED | Task is cancelled. Scheduler will not attempt to fill materialization gap. |
| DRAINED | Temporary managed job state where Tecton will automatically decide next steps. No action required. |
| SUCCESS | The task completed successfully. Only applicable to batch materialization tasks. |
Batch Materialization Jobsโ
Batch materialization processes run on a scheduled cadence as defined in the Batch or Stream Feature View. These jobs run on a schedule to process batch data sources, compute feature values, and write them to online/offline stores according to defined batch_schedule. They handle both initial backfills and ongoing updates. You can learn more about materialization job scheduling behavior in the materialization documentation.
How Batch Jobs are Triggered and Retriedโ
Batch jobs are automatically triggered based on the schedule defined in your Feature View. When a job fails, Tecton will automatically retry it according to the retry policy.
The Online Store Write Rate chart under the Feature View Monitoring tab shows how many records are written to the Online Store per second. If you have an idea of the total number of records your job needs to output, viewing the writes per second can give you an idea of how long the job will take to complete.
Streaming Materialization Jobsโ
For Stream Feature Views with Spark, Tecton orchestrates Spark Structured Streaming jobs to continuously update feature values when new data arrives. These jobs continuously process incoming stream data (from Kafka, Kinesis, or Push API), compute feature values, write them to the online/offline stores, and maintain fresh feature values with sub-second latency.
Metrics to Monitor Stream Healthโ
Even if a stream job is running, it may be failing to produce up-to-date features. The Stream Feature View Monitoring tab contains several metrics to help assess the progress of your Stream Feature View.
These metrics are also available through the Metrics API, allowing you to create custom dashboards and alerts in your Application Performance Monitoring system.
Processed Event Ageโ
Processed Event Age is the key metric for understanding how up-to-date your features are. It measures the difference between the time the write to the online store completes and the timestamp of the event. This metric includes both upstream processing time and the time taken by Tecton to transform and persist the event.
Input Rateโ
Input rate helps identify if there is a change in records being output by the upstream data source. It shows the rate of messages read from the stream.
Online Store Write Rateโ
Online Store Write Rate is the number of records being written to the Online Store as the output of the stream feature pipeline. This may be lower than the Input Rate due to:
- Filtering logic in the Data Source post-processor or Feature View transformation logic
- Multiple records for the same entity ID arriving in the same microbatch, causing events to be aggregated before write
Average Serving Delayโ
Average serving delay measures the difference between the time Tecton received the get-features request and the event timestamp of the feature retrieved (for an aggregation, the most recent event/tile).
Micro-batch Processing Latencyโ
Micro-batch processing latency shows the time between complete micro-batches. By default, this number should remain below 30 seconds since Stream Feature Views micro-batches are 30 seconds long. Above 30 seconds indicates that the stream processing job is under-resourced and will fall behind.
If using continuous processing, then micro-batch latency should be close to 0.
Interpreting Stream Lag and Write Rateโ
If your Processed Event Age suddenly increases, either:
- The stream processing is falling behind (look for increased microbatch processing latency)
- Your upstream data source is outputting stale records (check input rate changes)
Feature Freshness measures how up-to-date the stream feature data is. If no new data is coming in on the stream, or the stream feature pipeline is falling behind, then the freshness measurement will increase.
Specifically, Online Serving Feature Freshness measures the most recent timestamp written to the Online Store. Because this metric is polled periodically, the value reported here may be higher than the true value.
Other Job Typesโ
Deletionโ
- Removes feature data from online/offline stores when features or data are deleted
- Cleans up obsolete feature values after TTL expiration
Delta Maintenanceโ
- Performs periodic maintenance tasks on Delta tables in the offline store
- Runs OPTIMIZE and VACUUM operations to manage file compaction and cleanup
- Typically runs on a 7-day schedule
Ingestโ
- Processes data pushed through the Stream Ingest API
- Validates incoming data against schema
- Writes records to online/offline stores
Feature Publishโ
- Publishes materialized feature data to data warehouses for analysis
- Makes historical feature data available for exploration and feature selection
- Runs after successful materialization jobs
For more details, see the feature publish jobs documentation.
Dataset Generationโ
- Creates training datasets by joining features with provided training examples
- Ensures point-in-time correctness when retrieving historical feature values
- Supports both offline batch and streaming features
Integration Testโ
Stream
- Validates streaming feature pipelines end-to-end
- Tests stream processing, materialization and feature freshness
- Runs as part of CI/CD
Batch
- Validates batch feature pipelines end-to-end
- Tests materialization, retrieval and correctness of batch features
- Runs as part of CI/CD
For more information about integration testing, see the integration test documentation.
Compactionโ
- Optimizes storage of aggregation features in the online store
- Combines partial aggregates into fewer, more efficient tiles
- Reduces storage costs and improves query performance