Unit Testing Feature Views
Overview
Plan Hooks provide a framework for executing unit tests on your Feature View or Transformation logic every time tecton plan
or tecton apply
is run. A user will only be able to apply
their changes if the tests pass.
Plan Hooks can also be tested directly using tecton test
.
Plan Hooks are written in Python and therefore completely customizable. You can use plan hooks to execute logic that runs before each apply. Example use cases include enforcing a commit policy or running basic tests against your code.
Enabling Plan Hooks
When tecton init
is run to configure a feature repository in a new directory, it creates a folder called .tecton
containing the file .tecton/hooks/plan.py
. Plan Hooks are enabled by default, but the default configuration is a no-op.
How Plan Hooks Work
Arbitrary logic can be defined in plan.py
as long as it adheres to the return code contract for run()
. Each time tecton plan
or apply
is run, it will execute the run()
method in plan.py
. tecton
expects the following return codes when running run()
:
0
if all tests passNone
if no tests were run- Non-zero integer in the case of test failures
If a non-zero value is returned from run()
, stdout will be printed to stderr
. If a 0
or None
is returned, all hook output will be suppressed.
In summary, plan hooks must meet the following requirements:
- Must be defined in
.tecton/hooks/plan.py
- Must contain a
run()
method that accepts no arguments. run()
must return either0
(tests pass),None
(no tests run) or a non-zero integer return code (test failures).
To configure multiple plan hooks, it's recommended to define them in separate functions in plan.py
and
call each function from run()
.
Default Plan Hook: plan.py
The default contents of .tecton/hooks/plan.py
contain a no-op hook that returns None
.
### plan.py ###
from typing import Optional
def run() -> Optional[int]:
# No-op plan hook that returns None indicating no tests we run.
return None
When you run tecton plan
, you'll see ✅ Running Tests: No tests found.
in the output. For example:
$ tecton plan
Using workspace "prod"
✅ Imported 4 Python modules from the feature repository
✅ Running Tests: No tests found.
✅ Collecting local feature declarations
✅ Performing server-side validation of feature declarations
Running unit tests with pytest
First, we'll use plan.py
to configure a test harness for running pytest
Second, we'll show example unit tests for Pandas and Spark transformations.
Creating your test harness
This example test harness from Tecton's Sample Repository that runs against all files in the feature repo matching the pattern *_test.py
, test_*.py
, or test.py
. Optionally, it shows how to download a Spark binary for local testing.
Writing a Pandas Unit Test
When your Feature View or Transformation uses mode=pandas
, you can write simple Python-based unit tests.
This example validates the feature logic from our sample transaction_amount_is_high
feature, which checks if the transaction amount is over $10,000.
If tests fail, you'll see ⛔ Running Tests: Tests failed :(
along with test failure messages.
Spark Transformation Unit Test
Testing a PySpark or Spark SQL transformation is similar to the above example, except that we also need to provide a SparkSession
test fixture.
For example, let's say I have a transformation that calculates the number of impressions an ad had per calendar month.
### ad_impression_count_monthly.py ###
from tecton import transformation
from datetime import datetime
@transformation(mode="pyspark")
def ad_impression_count_monthly_transformer(ad_impressions_batch):
import pyspark.sql.functions as F
truncated_date_view = input_view.withColumn('timestamp', F.date_trunc('month', F.col('timestamp')))
return truncated_date_view.groupBy('ad_id', 'timestamp').agg(F.count(F.lit(1)).alias("ad_impression_count"))
Because this is a PySpark transformation, we'll need to create a SparkSession test fixture.
In our conftest.py
file:
import findspark
from pyspark.sql import SparkSession
import pytest
@pytest.fixture(scope="session")
def spark_session():
findspark.init(spark_home='.tecton/spark')
spark = SparkSession.builder.appName('pytest_spark_session').getOrCreate()
yield spark
spark.stop()
Finally, we can define the actual unit test that mocks up some sample ad impressions, and asserts that we're getting the expected counts.
import datetime
import pyspark
from feature_repo.shared.features.ad_impression_count_monthly import ad_impression_count_monthly_transformer
def test_monthly_impression_count(spark_session):
mock_data = [
('ad_id1', "2020-10-28 05:02:11"),
('ad_id1', "2020-10-30 01:00:00"),
('ad_id2', "2020-10-28 05:02:11")
]
input = spark_session.createDataFrame(mock_data, ['ad_id', 'timestamp'])
assert ad_impression_count_monthly_transformer is not None
output = ad_impression_count_monthly_transformer(input).collect()
assert output[0]['ad_id'] == 'ad_id1'
assert output[0]['ad_impression_count'] == 2
assert output[1]['ad_id'] == 'ad_id2'
assert output[1]['ad_impression_count'] == 1
tecton plan
.
Other Plan Hook Examples
File Naming Policy Test
As an example of other generic tests you can run with Plan Hooks, suppose you would like to create a naming policy that ensures all python files are prefixed with "ml_ops_"
.
The example below performs this assertion on all python files in the feature repository and returns 0 if all
names adhere to the policy or 1 if some names do not adhere to this policy.
### plan.py ###
from pathlib import Path
from typing import Optional
def run() -> Optional[int]:
# Run a naming policy check on all python files that checks that
# all file names begin with "ml_ops_"
# - 0 if all names adhere to the policy.
# - 1 (or any non-zero code) if names do not meet the policy.
root_path = str(Path().resolve())
py_files = []
py_files.extend([p.resolve() for p in Path(root_path).glob("**/*.py")])
bad_names = [p for p in py_files if not p.name.startswith("ml_ops_")]
if len(bad_names) > 0:
print("Invalid names:")
for n in bad_names:
print(str(n))
return 1
return 0
Skip Plan Hooks
Specifying the --skip-tests
flag when running tecton plan
or apply
will skip execution of Plan Hooks.
Reset Plan Hooks
If you get carried away writing customized Plan Hook behavior and want to revert to the default, simply run tecton init --reset-hooks
. This will delete the contents of .tecton/
and recreate the default plan.py
.