Data Quality Validations
This feature is currently in Public Preview.
- Available for Tecton on Databricks and EMR. Coming to Rift in a future release.
Data Quality Validations help detect feature data issues once a Feature View has
been materialized. If validation results indicate that feature data failed to
meet expectations during a materialization interval, an alert email will be sent
to an email provided as alert_email
in the Feature View declaration.
Terminology​
- Data Quality Metrics are statistics that describe feature values output by a Feature View during materialization. See Data Quality Metrics for more information.
- Expectations are verifiable assertions about metrics. Expectations can be based on metrics. For example, “Expect that <100% of values for a given feature are null”.
- Validations are the process of validating that the set of expectations has been met when materializing a Feature View. Validations can either pass or fail.
- Alerts notify the specified user when validation fails.
This document covers Data Quality Expectations, Validations, and Alerts.
Default Expectations​
By default, Tecton defines the following expectations for all Batch and Stream Feature Views.
For Stream Feature Views, Data Quality Metrics and Expectations only apply to offline materialized feature data.
Expectation | Applicable to | Explanation |
---|---|---|
Feature View row count > 0 | Feature Views | Expect feature rows to be produced when a Feature View is materialized |
A feature has any non-null values | All types of features | Expect a feature to have at least one non-null value, when there are feature rows. |
A feature has any non-zero values | Numerical features | Expect a feature to have at least one non-zero value, when there are feature rows. |
A feature has any non-empty values | String or Array features | Expect a feature to have at least one non-empty-string/array value when there are feature rows. |
Enable Validations and Alert Emails​
Validation can be disabled per Feature View, by setting
skip_default_expectations=True
in a Feature View declaration.
Email Alerting is enabled when alert_email
is specified in a Batch or Stream
Feature View definition. The alert email will be sent out at most once in 6
hours per Feature View. If you would like to disable all email alerts for a
Feature View, including other types of materialization alerts, leave this field
unset.
Viewing Validation Results​
You can view the validation results for all Feature Views in a workspace by selecting Data Quality in the left navigation panel in Tecton web UI.