Skip to content

Monitoring Materialization

Overview

If feature processing jobs begin to fail, Tecton can begin to serve stale or inaccurate data. To ensure that feature processing jobs stay healthy, Tecton offers monitoring, alerting and debugging tools.

For a practical example of debugging a materialization alert, see Example: Debugging Materialization Alerts.

Setting Up Alerts

Tecton can automatically generate materialization health alerts that are sent to a specified email address. See Types of Alerts for more details.

Note

It is highly recommend that an alert email is set for each FeaturePackage that is being consumed in production.

To configure alerts, specify monitoring when defining a FeaturePackage in your Feature Repository. MonitoringConfig objects configure alert thresholds and feature freshness expectations.

my_feature_package = TemporalFeaturePackage(
    ...
    monitoring = MonitoringConfig(
        monitor_freshness=True,
        expected_feature_freshness="2w",
        alert_email="kanye@tecton.ai"
    )
)
  • monitor_freshness: Set this to False to suppress freshness-related alerts.
  • expected_feature_freshness: Set this value to decrease the sensitivity of freshness alerts. See Default Expected Feature Freshness for details about the default value if this field is unspecified.
  • alert_email: Recipient of alerts.

Debugging Tools

Tecton provides tools to monitor and debug production Feature Packages from all Tecton tools: Web UI, SDK, and CLI.

Web UI: Health Overview

The easiest way to check the health of a materialized FeaturePackage is through the Web UI. Navigate to the FeaturePackage in question and switch to the “Materialization” tab to see Feature Package materialization diagnostics at a glance.

SDK: FeaturePackage Specifics

The Tecton SDK provides tools to dive deeper into the details of a Feature Package. For example, the materialization_status() method displays details about failed materialization attempts.

In the SDK and Web UI, Tecton provides a link to the auto-generated job that was used to compute feature values. This job link can be used to view the underlying error that caused a materialization job to fail.

To view this job, click on the Job status in the materialization table in the Web UI. This link is also available in the SDK materialization_status() method, and the tecton materialization-status command in the CLI.

Monitoring Materialization 1

This link will open a page in your Spark processing engine where you will be able to see the job failure. In the example below, we show a spot failure in Databricks:

Monitoring Materialization 2

CLI: Cluster Overview and Status

Tecton provides the ability to view the status of all Feature Packages in a cluster using the tecton freshness CLI command.

$ tecton freshness
           Feature Package               Stale?   Freshness   Expected Freshness     Created At
=================================================================================================
partner_ctr_performance:14d              Y        2wk 1d      2d                   12/02/20 10:52
ad_group_ctr_performance                 N        1h 1m       2h                   11/28/20 19:50
user_ad_impression_counts                N        1m 35s      2h                   10/01/20 2:16
content_keyword_ctr_performance:v2       N        1m 36s      2h                   09/04/20 22:22
content_keyword_ctr_performance          N        1m 37s      2h                   08/26/20 12:52
user_total_ad_frequency_counts           N        1m 38s      2h                   08/26/20 12:52

You can also use the $ tecton materialization-status $FP_NAME to see the materialization status of a specific FeaturePackage.

$ tecton materialization-status my_feature_package
All the displayed times are in UTC time zone
TYPE     WINDOW_START_TIME      WINDOW_END_TIME     STATUS    ATTEMPT_NUMBER     JOB_CREATED_AT      JOB_LOGS
================================================================================================================
BATCH   2020-12-15 00:00:00   2020-12-22 00:00:00   SUCCESS         1          2020-12-22 00:00:27   https://...
BATCH   2020-12-14 00:00:00   2020-12-21 00:00:00   SUCCESS         1          2020-12-21 00:00:14   https://...
BATCH   2020-12-13 00:00:00   2020-12-20 00:00:00   SUCCESS         1          2020-12-20 00:00:13   https://...
BATCH   2020-12-12 00:00:00   2020-12-19 00:00:00   SUCCESS         1          2020-12-19 00:00:10   https://...
BATCH   2020-12-11 00:00:00   2020-12-18 00:00:00   SUCCESS         1          2020-12-18 00:00:06   https://...

Default Expected Feature Freshness

By default, a FeaturePackage's freshness is expected to be less than twice the materialization schedule. Alerts will fire once this threshold, plus a small grace period, is crossed. The grace period's duration depends on on the FeaturePackage's materialization schedule:

Schedule Grace Period
<= 10 minutes 30 minutes
<= 30 minutes 90 minutes
<= 1 hour 2 hours
<= 4 hours 4 hours
<= 24 hours 12 hours
> 24 hours 24 hours

The table below has examples of materialization schedules mapped to default alert thresholds:

Schedule Default Alert Threshold
5 minutes 40 minutes
30 minutes 2 hours
1 hour 4 hours
4 hours 12 hours
24 hours 60 hours