If feature processing jobs begin to fail, Tecton can begin to serve stale or inaccurate data. To ensure that feature processing jobs stay healthy, Tecton offers monitoring, alerting and debugging tools.
For a practical example of debugging a materialization alert, see Example: Debugging Materialization Alerts.
Setting Up Alerts
Tecton can automatically generate materialization health alerts that are sent to a specified email address. See Types of Alerts for more details.
It is highly recommend that an alert email is set for each FeaturePackage that is being consumed in production.
To configure alerts, specify
monitoring when defining a
FeaturePackage in your Feature Repository.
MonitoringConfig objects configure alert thresholds and feature freshness expectations.
my_feature_package = TemporalFeaturePackage( ... monitoring = MonitoringConfig( monitor_freshness=True, expected_feature_freshness="2w", alert_email="email@example.com" ) )
monitor_freshness: Set this to
Falseto suppress freshness-related alerts.
expected_feature_freshness: Set this value to decrease the sensitivity of freshness alerts. See Default Expected Feature Freshness for details about the default value if this field is unspecified.
alert_email: Recipient of alerts.
Tecton provides tools to monitor and debug production Feature Packages from all Tecton tools: Web UI, SDK, and CLI.
Web UI: Health Overview
The easiest way to check the health of a materialized FeaturePackage is through the Web UI. Navigate to the
FeaturePackage in question and switch to the “Materialization” tab to see Feature Package materialization diagnostics at a glance.
SDK: FeaturePackage Specifics
The Tecton SDK provides tools to dive deeper into the details of a Feature Package. For example, the
materialization_status() method displays details about failed materialization attempts.
Materialization Job Links
In the SDK and Web UI, Tecton provides a link to the auto-generated job that was used to compute feature values. This job link can be used to view the underlying error that caused a materialization job to fail.
To view this job, click on the Job status in the materialization table in the Web UI. This link is also available in the SDK
materialization_status() method, and the
tecton materialization-status command in the CLI.
This link will open a page in your Spark processing engine where you will be able to see the job failure. In the example below, we show a spot failure in Databricks:
CLI: Cluster Overview and Status
Tecton provides the ability to view the status of all Feature Packages in a cluster using the
tecton freshness CLI command.
$ tecton freshness Feature Package Stale? Freshness Expected Freshness Created At ================================================================================================= partner_ctr_performance:14d Y 2wk 1d 2d 12/02/20 10:52 ad_group_ctr_performance N 1h 1m 2h 11/28/20 19:50 user_ad_impression_counts N 1m 35s 2h 10/01/20 2:16 content_keyword_ctr_performance:v2 N 1m 36s 2h 09/04/20 22:22 content_keyword_ctr_performance N 1m 37s 2h 08/26/20 12:52 user_total_ad_frequency_counts N 1m 38s 2h 08/26/20 12:52
You can also use the
$ tecton materialization-status $FP_NAME to see the materialization status of a specific FeaturePackage.
$ tecton materialization-status my_feature_package All the displayed times are in UTC time zone TYPE WINDOW_START_TIME WINDOW_END_TIME STATUS ATTEMPT_NUMBER JOB_CREATED_AT JOB_LOGS ================================================================================================================ BATCH 2020-12-15 00:00:00 2020-12-22 00:00:00 SUCCESS 1 2020-12-22 00:00:27 https://... BATCH 2020-12-14 00:00:00 2020-12-21 00:00:00 SUCCESS 1 2020-12-21 00:00:14 https://... BATCH 2020-12-13 00:00:00 2020-12-20 00:00:00 SUCCESS 1 2020-12-20 00:00:13 https://... BATCH 2020-12-12 00:00:00 2020-12-19 00:00:00 SUCCESS 1 2020-12-19 00:00:10 https://... BATCH 2020-12-11 00:00:00 2020-12-18 00:00:00 SUCCESS 1 2020-12-18 00:00:06 https://...
Default Expected Feature Freshness
By default, a FeaturePackage's freshness is expected to be less than twice the materialization schedule. Alerts will fire once this threshold, plus a small grace period, is crossed. The grace period's duration depends on on the FeaturePackage's materialization schedule:
|<= 10 minutes||30 minutes|
|<= 30 minutes||90 minutes|
|<= 1 hour||2 hours|
|<= 4 hours||4 hours|
|<= 24 hours||12 hours|
|> 24 hours||24 hours|
The table below has examples of materialization schedules mapped to default alert thresholds:
|Schedule||Default Alert Threshold|
|5 minutes||40 minutes|
|30 minutes||2 hours|
|1 hour||4 hours|
|4 hours||12 hours|
|24 hours||60 hours|