Job Types
Materialization​
Stream
- Continuously processes incoming stream data (from Kafka, Kinesis, or Push API) to compute feature values and write them to the online/offline stores
- Maintains fresh feature values with sub-second latency
Batch
- Runs on a schedule to process batch data sources and compute feature values
- Writes computed features to online/offline stores according to defined batch_schedule
- Handles both initial backfills and ongoing updates
Materialization Job States​
RUNNING: The task is running, meaning the materialization task attempt is being or will be run. New attempts may be created if errors are encountered.
DRAINING: The task is draining. Any running attempt will be cancelled. No new MaterializationTaskAttempts will be scheduled from this task.
MANUAL_RETRY: The terminated task (SUCCESS / PERMANENT_FAILURE) was manually requested to be re-executed. Retry policy is reset for the manually retried task (behaving as if it had no attempts executed before).
MANUAL_CANCELLATION_REQUESTED: Task cancellation is requested by a user. Similar to draining. No new MaterializationTaskAttempts will be scheduled from this task.
FAILURE: The task failed permanently. No new MaterializationTaskAttempts will be made from this task.
MANUALLY_CANCELLED: Task is cancelled. Similar to Drained but scheduler will not attempt to fill materialization gap.
DRAINED: Temporary managed job state where Tecton will automatically decide next steps for the job. No action required from the customer side.
SUCCESS: The task completed successfully. No new MaterializationTaskAttempts will be made from this task. Only applicable to the batch materialization tasks.
Deletion​
- Removes feature data from online/offline stores when features or data are deleted
- Cleans up obsolete feature values after TTL expiration
Delta Maintenance​
- Performs periodic maintenance tasks on Delta tables in the offline store
- Runs OPTIMIZE and VACUUM operations to manage file compaction and cleanup
- Typically runs on a 7-day schedule
Ingest​
- Processes data pushed through the Stream Ingest API
- Validates incoming data against schema
- Writes records to online/offline stores
Feature Publish​
- Publishes materialized feature data to data warehouses for analysis
- Makes historical feature data available for exploration and feature selection
- Runs after successful materialization jobs
Dataset Generation​
- Creates training datasets by joining features with provided training examples
- Ensures point-in-time correctness when retrieving historical feature values
- Supports both offline batch and streaming features
Integration Test​
Stream
- Validates streaming feature pipelines end-to-end
- Tests stream processing, materialization and feature freshness
- Runs as part of CI/CD
Batch
- Validates batch feature pipelines end-to-end
- Tests materialization, retrieval and correctness of batch features
- Runs as part of CI/CD
Compaction​
- Optimizes storage of aggregation features in the online store
- Combines partial aggregates into fewer, more efficient tiles
- Reduces storage costs and improves query performance