Skip to main content
Version: 1.0

Job Types

Materialization​

Stream

  • Continuously processes incoming stream data (from Kafka, Kinesis, or Push API) to compute feature values and write them to the online/offline stores
  • Maintains fresh feature values with sub-second latency

Batch

  • Runs on a schedule to process batch data sources and compute feature values
  • Writes computed features to online/offline stores according to defined batch_schedule
  • Handles both initial backfills and ongoing updates

More information

Materialization Job States​

RUNNING: The task is running, meaning the materialization task attempt is being or will be run. New attempts may be created if errors are encountered.

DRAINING: The task is draining. Any running attempt will be cancelled. No new MaterializationTaskAttempts will be scheduled from this task.

MANUAL_RETRY: The terminated task (SUCCESS / PERMANENT_FAILURE) was manually requested to be re-executed. Retry policy is reset for the manually retried task (behaving as if it had no attempts executed before).

MANUAL_CANCELLATION_REQUESTED: Task cancellation is requested by a user. Similar to draining. No new MaterializationTaskAttempts will be scheduled from this task.

FAILURE: The task failed permanently. No new MaterializationTaskAttempts will be made from this task.

MANUALLY_CANCELLED: Task is cancelled. Similar to Drained but scheduler will not attempt to fill materialization gap.

DRAINED: Temporary managed job state where Tecton will automatically decide next steps for the job. No action required from the customer side.

SUCCESS: The task completed successfully. No new MaterializationTaskAttempts will be made from this task. Only applicable to the batch materialization tasks.

Deletion​

  • Removes feature data from online/offline stores when features or data are deleted
  • Cleans up obsolete feature values after TTL expiration

Delta Maintenance​

  • Performs periodic maintenance tasks on Delta tables in the offline store
  • Runs OPTIMIZE and VACUUM operations to manage file compaction and cleanup
  • Typically runs on a 7-day schedule

Ingest​

  • Processes data pushed through the Stream Ingest API
  • Validates incoming data against schema
  • Writes records to online/offline stores

Feature Publish​

  • Publishes materialized feature data to data warehouses for analysis
  • Makes historical feature data available for exploration and feature selection
  • Runs after successful materialization jobs

More information

Dataset Generation​

  • Creates training datasets by joining features with provided training examples
  • Ensures point-in-time correctness when retrieving historical feature values
  • Supports both offline batch and streaming features

Integration Test​

Stream

  • Validates streaming feature pipelines end-to-end
  • Tests stream processing, materialization and feature freshness
  • Runs as part of CI/CD

Batch

  • Validates batch feature pipelines end-to-end
  • Tests materialization, retrieval and correctness of batch features
  • Runs as part of CI/CD

More information

Compaction​

  • Optimizes storage of aggregation features in the online store
  • Combines partial aggregates into fewer, more efficient tiles
  • Reduces storage costs and improves query performance

Was this page helpful?