Skip to content

Debugging FeaturePackage Alerts

Overview

After setting up FeaturePackage monitoring, you will begin to to receive email alerts for errors such as batch materialization failures.

Example

This example details a possible triage and debugging process once an alert email has been sent for FeaturePackageBatchMaterializationFailures.

The procedure has three parts:

  1. Navigate to the Web UI to examine recent Materialization Attempts
  2. Dive into further details using the CLI.
  3. Examine cluster-level status information using the CLI.

Alert Email

Assuming you have already defined an alert_email in your FeaturePackage's MonitoringConfig, you will receive an email alert when an error occurs. In this case, the error is FeaturePackageBatchMaterializationFailures which refers to a failure with a batch materialization job. For more information on failure types, see the monitoring documentation.

Example Email Alert

View Materialization Information in the Web UI

In the email alert you received, click on the link to view the alerting FeaturePackage in the Tecton Web UI.

In the Materialization tab, you can find materialization configurations and information about recent jobs.

If the current expected_feature_freshness is too low resulting in noisy freshness alerts, specifying a higher value in the MonitoringConfig defined in your feature repository will often help. If the expected freshness is less than the actual freshness, the FeaturePackage is serving stale data.

Monitoring and Materialization Config Screenshot

Scrolling down, we will use the "Materialization Status" and "Materialization Jobs" sections to help locate the source of our error.

Materialization Status & Jobs Screenshot

The table showing materialization jobs and their attempts, is often useful for locating individual Batch errors like ours. Click on the failing job row to view the specific Spark error message. Alternatively, you can see previously failed retries when available.

Seeking More Materialization Job Information

If the materialization tab in the Web UI or its linked Spark jobs did not provide enough information to debug the error, use the Tecton CLI or SDK to find more information.

From the Tecton CLI use tecton materialization-status [FEATURE-PACKAGE-NAME]. Use tecton materialization-status -h to display available flags.

$ tecton materialization-status ad_ground_truth_ctr_performance_7_days --limit=5
All the displayed times are in UTC time zone
TYPE     WINDOW_START_TIME      WINDOW_END_TIME     STATUS    ATTEMPT_NUMBER     JOB_CREATED_AT                                            JOB_LOGS
=========================================================================================================================================================================================
BATCH   2020-12-14 00:00:00   2020-12-21 00:00:00   SUCCESS         1          2020-12-21 00:00:14   https://...cloud.databricks.com/?o=3650800870221207#job/1772891/run/1
BATCH   2020-12-13 00:00:00   2020-12-20 00:00:00   SUCCESS         1          2020-12-20 00:00:13   https://...cloud.databricks.com/?o=3650800870221207#job/1772743/run/1
BATCH   2020-12-12 00:00:00   2020-12-19 00:00:00   SUCCESS         1          2020-12-19 00:00:10   https://...cloud.databricks.com/?o=3650800870221207#job/1772598/run/1
BATCH   2020-12-11 00:00:00   2020-12-18 00:00:00   SUCCESS         1          2020-12-18 00:00:06   https://...cloud.databricks.com/?o=3650800870221207#job/1772447/run/1
BATCH   2020-12-10 00:00:00   2020-12-17 00:00:00   SUCCESS         1          2020-12-17 00:00:13   https://...cloud.databricks.com/?o=3650800870221207#job/1772294/run/1

You can also view this information through the Tecton SDK by using:

import tecton
fp = tecton.get_feature_package("feature_package_name")
fp.materialization_status()

Cluster-Level Freshness Information

If multiple FeaturePackages in your cluster are stale, you can obtain an overview of top-level cluster information using tecton freshness. This is often caused by a common data source having no new data.

$ tecton freshness
           Feature Package               Stale?   Freshness   Expected Freshness     Created At
=================================================================================================
ad_ground_truth_ctr_performance_7_days   N        14h 40m     2d                   10/01/20 2:25
user_ad_impression_counts                N        40m 24s     2h                   10/01/20 2:16
content_keyword_ctr_performance:v2       N        40m 25s     2h                   09/04/20 22:22
ad_group_ctr_performance                 N        40m 26s     2h                   08/26/20 12:52
ad_is_displayed_as_banner                -        -           -                    07/24/20 13:51