Testing Batch Features
This guide covers how to test batch features in Tecton using offline retrieval methods with mock data.
Overviewโ
Testing batch features allows you to validate your feature transformation logic and aggregation computations before deploying to production. You can test batch features by passing mock data to the offline retrieval methods.
Testing Methodsโ
Batch features support multiple testing approaches:
1. Testing Final Feature Valuesโ
Time Range Testingโ
Use get_features_in_range() to test the complete feature computation including
aggregations over a time range:
# Test final aggregated feature values over a time range
result_df = batch_fv.get_features_in_range(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"source_name": mock_data}
)
Result
| user_id | transaction_count_1d_1d | transaction_count_30d_1d | transaction_count_90d_1d | _valid_from | _valid_to |
|---|---|---|---|---|---|
| user_1 | 2 | 8 | 34 | 2022-05-01 00:00:00 | 2022-05-02 00:00:00 |
| user_2 | 1 | 42 | 141 | 2022-05-01 00:00:00 | 2022-05-02 00:00:00 |
Point-in-Time Testingโ
Use get_features_for_events() to test final feature values for specific
entity/timestamp combinations:
# Test final feature values for specific events
events_df = pandas.DataFrame(
{"user_id": ["user_1", "user_2"], "timestamp": [datetime(2022, 5, 1, 12), datetime(2022, 5, 1, 15)]}
)
result_df = batch_fv.get_features_for_events(events=events_df, mock_inputs={"source_name": mock_data})
Result
| user_id | timestamp | user_transaction_counts__transaction_count_1d_1d | user_transaction_counts__transaction_count_30d_1d |
|---|---|---|---|
| user_1 | 2022-05-01 12:00:00 | 0 | 28 |
| user_2 | 2022-05-01 15:00:00 | 0 | 13 |
2. Testing Partial Aggregatesโ
For batch features with aggregations, use get_partial_aggregates() to test
intermediate aggregation results
(time range only):
# Test partial aggregation tiles
result_df = batch_fv.get_partial_aggregates(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"source_name": mock_data}
)
Result
| user_id | transaction_count_1d | _interval_start_time | _interval_end_time |
|---|---|---|---|
| user_1 | 4 | 2022-05-01 00:00:00 | 2022-05-02 00:00:00 |
| user_2 | 1 | 2022-05-01 00:00:00 | 2022-05-02 00:00:00 |
3. Testing Transformation Logic Onlyโ
Use run_transformation() to test just the transformation logic without
aggregations (time range only). This runs the transformation function as it
would for a materialization job with the time range [start_time, end_time):
# Test transformation logic only
result_df = batch_fv.run_transformation(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"source_name": mock_data}
)
Result
| user_id | timestamp | transaction | signup_timestamp | credit_card_issuer |
|---|---|---|---|---|
| user_1 | 2022-05-01 21:04:38 | 1 | 2021-01-01 06:25:57 | other |
| user_2 | 2022-05-01 19:45:14 | 1 | 2021-01-01 07:16:06 | Visa |
4. Testing Online Feature Storeโ
Test the latest feature values from the online feature store using
get_online_features():
get_online_features() to read features in production.This method is intended for testing and does not have production level performance. To read features online efficiently in production, see Reading Features for Inference.
# Test reading the latest features from the online store
online_result = fv.get_online_features({"user_id": "user_1"}).to_dict()
print(online_result)
Result
{
"transaction_count_1d_1d": 1,
"transaction_count_30d_1d": 17,
"transaction_count_90d_1d": 56,
}
Understanding Batch Feature View Aggregation Levelsโ
When a feature view has tile aggregations, the query operates in three logical steps:
- Transformation (
run_transformation()) - The feature view query runs over the provided time range [start_time, end_time) with user-defined transformations applied - Partial Aggregation (
get_partial_aggregates()) - Results are aggregated into tiles based on theaggregation_interval - Final Aggregation (
get_features_in_range()orget_features_for_events()) - Tiles are combined to form final feature values based on thetime_windowof each aggregation
Mock Data Guidelinesโ
When creating mock data for batch feature testing:
Data Requirementsโ
- Include all columns referenced in your feature view transformation
- Ensure timestamp columns are properly formatted
- Include entity join key columns
- Provide sufficient data to cover your test scenarios
Best Practicesโ
- Use realistic data ranges that match your expected production data
- Include edge cases (nulls, extreme values, empty results)
- Test with data that spans multiple aggregation intervals
- Verify timestamp filtering works correctly with your mock data
Common Testing Scenariosโ
Testing Non-Aggregate Featuresโ
For simple batch features without aggregations:
- Focus on transformation logic correctness
- Verify data filtering and joins work as expected
- Test with various data patterns and edge cases
Testing Aggregate Featuresโ
For batch features with Tecton-managed aggregations:
- Test each aggregation level (transformation, partial, final)
- Verify aggregation windows compute correctly
- Test boundary conditions (start/end of windows)
- Validate that different time windows produce expected results
Testing Feature Views that use Incremental Backfillsโ
For features using incremental backfills:
- Test that the incremental backfill job is reading the expected amount of data
- Ensure that you are running
run_transformation()with a time period equal to the batch schedule of the feature view
Example: Testing Aggregate Features End-to-Endโ
This example demonstrates testing all three aggregation levels for a batch feature view with Tecton-managed aggregations:
import tecton
import pandas as pd
from datetime import datetime
ws = tecton.get_workspace("prod")
agg_fv = ws.get_feature_view("user_transaction_counts")
# Create transaction data spanning multiple days for aggregation testing
transaction_data = pd.DataFrame(
{
"user_id": ["user_1", "user_2", "user_3"] * 5,
"timestamp": [
datetime(2022, 5, 1, 21, 4, 38),
datetime(2022, 5, 1, 19, 45, 14),
datetime(2022, 5, 1, 15, 18, 48),
datetime(2022, 5, 1, 7, 11, 31),
datetime(2022, 5, 1, 1, 50, 51),
datetime(2022, 5, 2, 9, 30, 15),
datetime(2022, 5, 2, 14, 20, 22),
datetime(2022, 5, 2, 18, 45, 33),
datetime(2022, 5, 3, 8, 15, 44),
datetime(2022, 5, 3, 12, 30, 55),
datetime(2022, 5, 3, 16, 45, 11),
datetime(2022, 5, 3, 20, 0, 22),
datetime(2022, 5, 4, 10, 15, 33),
datetime(2022, 5, 4, 14, 30, 44),
datetime(2022, 5, 4, 18, 45, 55),
],
"transaction": [1] * 15,
}
)
# 1. Test transformation logic
transformation_result = agg_fv.run_transformation(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"transactions": transaction_data}
)
print("Transformation output:")
display(transformation_result.to_pandas())
# 2. Test partial aggregates (tiles)
partial_result = agg_fv.get_partial_aggregates(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"transactions": transaction_data}
)
print("Partial aggregates (tiles):")
display(partial_result.to_pandas())
# 3. Test final aggregated features
final_result = agg_fv.get_features_in_range(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"transactions": transaction_data}
)
print("Final aggregated features:")
display(final_result.to_pandas())