Skip to main content
Version: 1.2

Testing Batch Features

This guide covers how to test batch features in Tecton using offline retrieval methods with mock data.

Overviewโ€‹

Testing batch features allows you to validate your feature transformation logic and aggregation computations before deploying to production. You can test batch features by passing mock data to the offline retrieval methods.

Testing Methodsโ€‹

Batch features support multiple testing approaches:

1. Testing Final Feature Valuesโ€‹

Time Range Testingโ€‹

Use get_features_in_range() to test the complete feature computation including aggregations over a time range:

# Test final aggregated feature values over a time range
result_df = batch_fv.get_features_in_range(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"source_name": mock_data}
)
Result
user_idtransaction_count_1d_1dtransaction_count_30d_1dtransaction_count_90d_1d_valid_from_valid_to
user_128342022-05-01 00:00:002022-05-02 00:00:00
user_21421412022-05-01 00:00:002022-05-02 00:00:00

Point-in-Time Testingโ€‹

Use get_features_for_events() to test final feature values for specific entity/timestamp combinations:

# Test final feature values for specific events
events_df = pandas.DataFrame(
{"user_id": ["user_1", "user_2"], "timestamp": [datetime(2022, 5, 1, 12), datetime(2022, 5, 1, 15)]}
)

result_df = batch_fv.get_features_for_events(events=events_df, mock_inputs={"source_name": mock_data})
Result
user_idtimestampuser_transaction_counts__transaction_count_1d_1duser_transaction_counts__transaction_count_30d_1d
user_12022-05-01 12:00:00028
user_22022-05-01 15:00:00013

2. Testing Partial Aggregatesโ€‹

For batch features with aggregations, use get_partial_aggregates() to test intermediate aggregation results (time range only):

# Test partial aggregation tiles
result_df = batch_fv.get_partial_aggregates(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"source_name": mock_data}
)
Result
user_idtransaction_count_1d_interval_start_time_interval_end_time
user_142022-05-01 00:00:002022-05-02 00:00:00
user_212022-05-01 00:00:002022-05-02 00:00:00

3. Testing Transformation Logic Onlyโ€‹

Use run_transformation() to test just the transformation logic without aggregations (time range only). This runs the transformation function as it would for a materialization job with the time range [start_time, end_time):

# Test transformation logic only
result_df = batch_fv.run_transformation(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"source_name": mock_data}
)
Result
user_idtimestamptransactionsignup_timestampcredit_card_issuer
user_12022-05-01 21:04:3812021-01-01 06:25:57other
user_22022-05-01 19:45:1412021-01-01 07:16:06Visa

4. Testing Online Feature Storeโ€‹

Test the latest feature values from the online feature store using get_online_features():

Do not use get_online_features() to read features in production.

This method is intended for testing and does not have production level performance. To read features online efficiently in production, see Reading Features for Inference.

# Test reading the latest features from the online store
online_result = fv.get_online_features({"user_id": "user_1"}).to_dict()
print(online_result)
Result
{
"transaction_count_1d_1d": 1,
"transaction_count_30d_1d": 17,
"transaction_count_90d_1d": 56,
}

Understanding Batch Feature View Aggregation Levelsโ€‹

When a feature view has tile aggregations, the query operates in three logical steps:

  1. Transformation (run_transformation()) - The feature view query runs over the provided time range [start_time, end_time) with user-defined transformations applied
  2. Partial Aggregation (get_partial_aggregates()) - Results are aggregated into tiles based on the aggregation_interval
  3. Final Aggregation (get_features_in_range() or get_features_for_events()) - Tiles are combined to form final feature values based on the time_window of each aggregation

Mock Data Guidelinesโ€‹

When creating mock data for batch feature testing:

Data Requirementsโ€‹

  • Include all columns referenced in your feature view transformation
  • Ensure timestamp columns are properly formatted
  • Include entity join key columns
  • Provide sufficient data to cover your test scenarios

Best Practicesโ€‹

  • Use realistic data ranges that match your expected production data
  • Include edge cases (nulls, extreme values, empty results)
  • Test with data that spans multiple aggregation intervals
  • Verify timestamp filtering works correctly with your mock data

Common Testing Scenariosโ€‹

Testing Non-Aggregate Featuresโ€‹

For simple batch features without aggregations:

  • Focus on transformation logic correctness
  • Verify data filtering and joins work as expected
  • Test with various data patterns and edge cases

Testing Aggregate Featuresโ€‹

For batch features with Tecton-managed aggregations:

  • Test each aggregation level (transformation, partial, final)
  • Verify aggregation windows compute correctly
  • Test boundary conditions (start/end of windows)
  • Validate that different time windows produce expected results

Testing Feature Views that use Incremental Backfillsโ€‹

For features using incremental backfills:

  • Test that the incremental backfill job is reading the expected amount of data
  • Ensure that you are running run_transformation() with a time period equal to the batch schedule of the feature view

Example: Testing Aggregate Features End-to-Endโ€‹

This example demonstrates testing all three aggregation levels for a batch feature view with Tecton-managed aggregations:

import tecton
import pandas as pd
from datetime import datetime

ws = tecton.get_workspace("prod")
agg_fv = ws.get_feature_view("user_transaction_counts")

# Create transaction data spanning multiple days for aggregation testing
transaction_data = pd.DataFrame(
{
"user_id": ["user_1", "user_2", "user_3"] * 5,
"timestamp": [
datetime(2022, 5, 1, 21, 4, 38),
datetime(2022, 5, 1, 19, 45, 14),
datetime(2022, 5, 1, 15, 18, 48),
datetime(2022, 5, 1, 7, 11, 31),
datetime(2022, 5, 1, 1, 50, 51),
datetime(2022, 5, 2, 9, 30, 15),
datetime(2022, 5, 2, 14, 20, 22),
datetime(2022, 5, 2, 18, 45, 33),
datetime(2022, 5, 3, 8, 15, 44),
datetime(2022, 5, 3, 12, 30, 55),
datetime(2022, 5, 3, 16, 45, 11),
datetime(2022, 5, 3, 20, 0, 22),
datetime(2022, 5, 4, 10, 15, 33),
datetime(2022, 5, 4, 14, 30, 44),
datetime(2022, 5, 4, 18, 45, 55),
],
"transaction": [1] * 15,
}
)

# 1. Test transformation logic
transformation_result = agg_fv.run_transformation(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"transactions": transaction_data}
)
print("Transformation output:")
display(transformation_result.to_pandas())

# 2. Test partial aggregates (tiles)
partial_result = agg_fv.get_partial_aggregates(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"transactions": transaction_data}
)
print("Partial aggregates (tiles):")
display(partial_result.to_pandas())

# 3. Test final aggregated features
final_result = agg_fv.get_features_in_range(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), mock_inputs={"transactions": transaction_data}
)
print("Final aggregated features:")
display(final_result.to_pandas())

Was this page helpful?