Version: 0.9

Alerts

Configuring Alerts

Tecton can automatically generate materialization health alerts and online store feature freshness alerts that are sent to a specified email address. There are different types of alerts for various materialization issues:

Freshness Alerts
- FeatureViewNotFresh
Repeated Failures Alerts
- FeatureViewBatchMaterializationFailures
- FeatureViewStreamingMaterializationFailures
Too Many Failures Alerts
- FeatureViewTooManyFailures
Feature Table Ingestion Failures Alerts
- FeatureTableIngestTaskRecurrentFailures

note

It is highly recommend that an alert email is set for each Feature View or Feature Table that is being consumed in production.

Feature Views

To configure alerts, specify alert_email and monitor_freshness when declaring a Feature View in your Feature Repository.

@batch_feature_view(
    monitor_freshness=True,  # required to enable alerts
    alert_email="demo-user@tecton.ai",  # required alert recipient
    expected_feature_freshness=timedelta(weeks=2),  # optional override
    # ...
)
def my_feature_view(inputs):
    pass

monitor_freshness: Set this to False to suppress online store freshness-related alerts.
alert_email: Recipient of alerts. Must be an email address.
expected_feature_freshness: See Expected Feature Freshness for details about the default value if this field is unspecified. This can be set to a longer duration if the default threshold is too low.

Feature Tables

To configure alerts, specify alert_email when declaring a Feature Table in your Feature Repository.

my_feature_table = FeatureTable(
    alert_email="demo-user@tecton.ai",  # required alert recipient
    # ...
)

Freshness Alerts

Feature View data is considered stale when materialization is enabled, but new features are not being materialized. Tecton triggers a FeatureViewNotFresh when feature data becomes too stale based on the Expected Feature Freshness threshold, which can be overridden using the expected_feature_freshness parameter.

The most common causes of this type of alert are:

Missing upstream data
Errors in feature definitions that cause materialization jobs to either fail or produce no new feature values
Outage or spot instance unavailability causing materialization jobs to fail

Repeated Failures Alerts

Tecton automatically schedules retries for failing materialization jobs using a retry strategy. Tecton will trigger an alert if these failures happen frequently. There are two types of repeated failure alerts:

FeatureViewBatchMaterializationFailures. Batch materialization jobs have failed 3 or more times.
FeatureViewStreamingMaterializationFailures. Streaming materialization jobs have failed 3 or more times.

note

Materialization jobs that need to be retried due to spot instance availability are not considered failures.

Too Many Failures Alerts

When materialization retries fail too many times, Tecton will move the Feature View to a "Too Many Failures" state and will not continue to retry materialization.

At this point, the FeatureViewTooManyFailures alert will be fired. This alert is most commonly caused by incorrect Transformation code.

Feature Table Ingestion Failures Alerts

When Feature Table data ingestion has failed 2 or more times in the past 2 hours, Tecton will fire a FeatureTableIngestTaskRecurrentFailures alert. This alert most commonly fires when too many concurrent ingestion / materialization jobs are already running or if available compute resources are insufficient.

Example: Debugging Alerts

This example details a possible triage and debugging process once an alert email has been sent for FeatureViewBatchMaterializationFailures.

The procedure has three parts:

Navigate to the Web UI to examine recent Materialization attempts
Dive into further details using the CLI.
Examine cluster-level status information using the CLI.

Email alert notification

Assuming you have already defined an alert_email in your Feature View's definition, you will receive an email alert when an error occurs. In this case, the error is FeatureViewBatchMaterializationFailures which refers to a failure with a batch materialization job.

Materialization Info in the Web UI

Click on the link in the email to view the alerting Feature View in Tecton's Web UI.

Navigate to the Materialization tab to explore materialization configuration and information about recent jobs.

If the Expected Feature Freshness is too low resulting in noisy freshness alerts, specifying a higher value for expected_feature_freshness might help. If the expected freshness is less than the actual freshness, the Feature View is considered to be serving stale data.

Monitoring and Materialization Config Screenshot

Scrolling down, use the "Materialization Status" and "Materialization Jobs" sections to help locate the source of the error.

Materialization Status & Jobs Screenshot

The Materialization Jobs table is often useful for locating individual errors. Click on the failing job link will take you to the failing job.

note

Clicking on the failing job link is supported for Databricks jobs, but is not supported for navigating to EMR jobs.

Historical Feature View Materialization Jobs

If the materialization tab in the Web UI or its linked jobs did not provide enough information to debug the error, use the Tecton CLI or SDK to find more information.

From the Tecton CLI use tecton materialization-status [FEATURE-VIEW-NAME]. Use tecton materialization-status -h to display available flags.

$ tecton materialization-status ad_ground_truth_ctr_performance_7_days --limit=5

All the displayed times are in UTC time zone
TYPE     WINDOW_START_TIME      WINDOW_END_TIME     STATUS    ATTEMPT_NUMBER     JOB_CREATED_AT                                            JOB_LOGS
=========================================================================================================================================================================================
BATCH   2020-12-14 00:00:00   2020-12-21 00:00:00   SUCCESS         1          2020-12-21 00:00:14   https://...cloud.databricks.com/?o=3650800870221207#job/1772891/run/1
BATCH   2020-12-13 00:00:00   2020-12-20 00:00:00   SUCCESS         1          2020-12-20 00:00:13   https://...cloud.databricks.com/?o=3650800870221207#job/1772743/run/1
BATCH   2020-12-12 00:00:00   2020-12-19 00:00:00   SUCCESS         1          2020-12-19 00:00:10   https://...cloud.databricks.com/?o=3650800870221207#job/1772598/run/1
BATCH   2020-12-11 00:00:00   2020-12-18 00:00:00   SUCCESS         1          2020-12-18 00:00:06   https://...cloud.databricks.com/?o=3650800870221207#job/1772447/run/1
BATCH   2020-12-10 00:00:00   2020-12-17 00:00:00   SUCCESS         1          2020-12-17 00:00:13   https://...cloud.databricks.com/?o=3650800870221207#job/1772294/run/1

You can also view this information through the Tecton SDK by using:

import tecton
ws = tecton.get_workspace("workspace_name")
fv = ws.get_feature_view("feature_view_name")
fv.materialization_status()

Cluster-Level Freshness Information

If multiple FeatureViews in your cluster are stale, you can obtain an overview of top-level cluster information using tecton freshness. This is often caused by a common data source having no new data or an under-provisioned stream.

$ tecton freshness

           Feature View               Stale?   Freshness   Expected Freshness     Created At
=================================================================================================
ad_ground_truth_ctr_performance_7_days   N        14h 40m     2d                   10/01/21 2:25
user_ad_impression_counts                N        40m 24s     2h                   10/01/21 2:16
content_keyword_ctr_performance:v2       N        40m 25s     2h                   09/04/21 22:22
ad_group_ctr_performance                 N        40m 26s     2h                   08/26/21 12:52
ad_is_displayed_as_banner                -        -           -                    07/24/21 13:51

Configuring Alerts​

Feature Views​

Feature Tables​

Freshness Alerts​

Repeated Failures Alerts​

Too Many Failures Alerts​

Feature Table Ingestion Failures Alerts​

Example: Debugging Alerts​

Email alert notification​

Materialization Info in the Web UI​

Historical Feature View Materialization Jobs​

Cluster-Level Freshness Information​

Was this page helpful?

Configuring Alerts

Feature Views

Feature Tables

Freshness Alerts

Repeated Failures Alerts

Too Many Failures Alerts

Feature Table Ingestion Failures Alerts

Example: Debugging Alerts

Email alert notification

Materialization Info in the Web UI

Historical Feature View Materialization Jobs

Cluster-Level Freshness Information