After setting up monitoring, you will begin to to receive email alerts for errors such as batch materialization failures.
This example details a possible triage and debugging process once an alert email has been sent for
The procedure has three parts:
- Navigate to the Web UI to examine recent Materialization Attempts
- Dive into further details using the CLI.
- Examine cluster-level status information using the CLI.
Assuming you have already defined an
alert_email in your FeatureView's MonitoringConfig, you will receive an email alert when an error occurs. In this case, the error is
FeatureViewBatchMaterializationFailures which refers to a failure with a batch materialization job. For more information on failure types, see the monitoring documentation.
View Materialization Information in the Web UI
In the email alert you received, click on the link to view the alerting FeatureView in the Tecton Web UI.
In the Materialization tab, you can find materialization configurations and information about recent jobs.
If the current
expected_feature_freshness is too low resulting in noisy freshness alerts, specifying a higher value in the
MonitoringConfig defined in your feature repository will often help. If the expected freshness is less than the actual freshness, the FeatureView is serving stale data.
Scrolling down, we will use the "Materialization Status" and "Materialization Jobs" sections to help locate the source of our error.
The table showing materialization jobs and their attempts, is often useful for locating individual Batch errors like ours. Click on the failing job row to view the specific Spark error message. Alternatively, you can see previously failed retries when available.
Seeking More Materialization Job Information
If the materialization tab in the Web UI or its linked Spark jobs did not provide enough information to debug the error, use the Tecton CLI or SDK to find more information.
From the Tecton CLI use
tecton materialization-status [FEATURE-VIEW-NAME]. Use
tecton materialization-status -h to display available flags.
$ tecton materialization-status ad_ground_truth_ctr_performance_7_days --limit=5 All the displayed times are in UTC time zone TYPE WINDOW_START_TIME WINDOW_END_TIME STATUS ATTEMPT_NUMBER JOB_CREATED_AT JOB_LOGS ========================================================================================================================================================================================= BATCH 2020-12-14 00:00:00 2020-12-21 00:00:00 SUCCESS 1 2020-12-21 00:00:14 https://...cloud.databricks.com/?o=3650800870221207#job/1772891/run/1 BATCH 2020-12-13 00:00:00 2020-12-20 00:00:00 SUCCESS 1 2020-12-20 00:00:13 https://...cloud.databricks.com/?o=3650800870221207#job/1772743/run/1 BATCH 2020-12-12 00:00:00 2020-12-19 00:00:00 SUCCESS 1 2020-12-19 00:00:10 https://...cloud.databricks.com/?o=3650800870221207#job/1772598/run/1 BATCH 2020-12-11 00:00:00 2020-12-18 00:00:00 SUCCESS 1 2020-12-18 00:00:06 https://...cloud.databricks.com/?o=3650800870221207#job/1772447/run/1 BATCH 2020-12-10 00:00:00 2020-12-17 00:00:00 SUCCESS 1 2020-12-17 00:00:13 https://...cloud.databricks.com/?o=3650800870221207#job/1772294/run/1
You can also view this information through the Tecton SDK by using:
import tecton fv = tecton.get_feature_view("feature_view_name") fv.materialization_status()
Cluster-Level Freshness Information
If multiple FeatureViews in your cluster are stale, you can obtain an overview of top-level cluster information using
tecton freshness. This is often caused by a common data source having no new data.
$ tecton freshness Feature View Stale? Freshness Expected Freshness Created At ================================================================================================= ad_ground_truth_ctr_performance_7_days N 14h 40m 2d 10/01/20 2:25 user_ad_impression_counts N 40m 24s 2h 10/01/20 2:16 content_keyword_ctr_performance:v2 N 40m 25s 2h 09/04/20 22:22 ad_group_ctr_performance N 40m 26s 2h 08/26/20 12:52 ad_is_displayed_as_banner - - - 07/24/20 13:51