Testing Realtime Features
Overviewโ
Testing realtime features involves validating both the transformation logic and the interaction with batch/stream feature dependencies. Realtime features can be tested using mock data to simulate both request data and feature dependencies.
Testing Methodsโ
Realtime features support multiple testing approaches:
1. Using get_features_for_events() with mock inputsโ
The get_features_for_events() method allows you to test offline retrieval of
realtime features by providing mock feature values directly in the events
DataFrame.
Unlike batch and stream feature views that use a mock_inputs parameter,
Realtime Feature Views mock request sources and dependent feature values
directly in the events data frame. This allows you to specify different mocked
values for each timestamp in the events DataFrame, enabling point-in-time
testing scenarios.
If a dependent feature column is not mocked, Tecton will try to compute it from the raw data source source or read it from the offline store. If you would like to purely use mock inputs and not read from a data source, you should mock every feature column in the dependent feature view.
Learn more about mocking dependent features and request sources
# Create events DataFrame with mock feature dependencies
# This RTFV depends on user_spending_metrics and user_profile feature views
events_df = pd.DataFrame(
{
"user_id": ["user_1", "user_2"],
"timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
"transaction_amount": [500, 750],
# Mock batch feature dependencies - these represent the dependent feature values
# that would be available at the timestamp for each user
"user_spending_metrics__avg_daily_spend": [100, 200],
"user_profile__account_tier": ["premium", "basic"],
}
)
# Test realtime feature
result_df = rtfv.get_features_for_events(events_df)
Result
| user_id | timestamp | transaction_amount | fraud_detection__is_high_risk | fraud_detection__tier_multiplier |
|---|---|---|---|---|
| user_1 | 2023-04-02 00:00:00 | 500 | 1 | 1.5 |
| user_2 | 2023-04-03 00:00:00 | 750 | 1 | 1.0 |
2. Using run_transformation() with mock input dataโ
For realtime feature views with transformations where you would like to test the
results of a single invocation of the RTFV, use run_transformation() with mock
input data:
# Mock all feature inputs
input_data = {
"transaction_request": pd.DataFrame([{"amount": 100.0, "merchant_category": "grocery"}]),
"user_spending_metrics": pd.DataFrame([{"avg_spend_30d": 200.0, "transaction_count_7d": 15}]),
}
result = rtfv.run_transformation(input_data=input_data)
print(result) # {"is_spending_anomaly": 0}
Mocking Dependent Features and Request Sourcesโ
When testing realtime features with get_features_for_events(), you can mock
both dependent feature values and request source data directly in the events
DataFrame. This allows you to test your realtime transformation logic without
needing to materialize dependent features.
Mocking Request Sourcesโ
Request source data uses the original column names as defined in your RequestSource schema:
# For a RequestSource with fields: transaction_amount, merchant_category
events_df = pd.DataFrame(
{
"user_id": ["user_1", "user_2"],
"timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
# Request source columns - use original names
"transaction_amount": [500, 750],
"merchant_category": ["grocery", "electronics"],
}
)
Mocking Dependent Feature Valuesโ
Batch and stream feature dependencies use the naming convention
{feature_view_name}__{feature_name}:
These mock columns simulate the exact feature values that would be passed to the realtime feature view at the point-in-time specified by the timestamp in each row.
# Mock dependent features from other feature views
events_df = pd.DataFrame(
{
"user_id": ["user_1", "user_2"],
"timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
# Request source data
"transaction_amount": [500, 750],
# Dependent feature values - use feature_view__feature_name format
"user_spending_metrics__avg_daily_spend": [100, 200],
"user_profile__account_tier": ["premium", "basic"],
"merchant_risk_metrics__fraud_rate": [0.02, 0.08],
}
)
Column Naming Convention Summaryโ
- Request Sources: Use original column names (
transaction_amount,merchant_category, etc.) - Dependent Features: Use
{feature_view_name}__{feature_name}format
Point-in-Time Semanticsโ
Each row in the events DataFrame represents a specific point in time when the realtime feature would be computed. The mock feature columns provide the exact values that would be available at that timestamp:
events_df = pd.DataFrame(
{
"user_id": ["user_123", "user_123"],
"timestamp": [
datetime(2023, 4, 2, 10, 0), # Morning computation
datetime(2023, 4, 2, 18, 0), # Evening computation
],
"transaction_amount": [100, 150],
# These represent the user's daily average at each timestamp
"user_spending_metrics__avg_daily_spend": [50, 75], # Average increased throughout the day
}
)
Example: Testing Dependency Mocking for a Fraud Detection Featureโ
This example demonstrates how to test a realtime fraud detection feature, which analyzes transaction data to determine the likelihood of fraudulent activity. It utilizes dependent feature views, such as user spending metrics and merchant risk metrics, to generate a risk score for each transaction.
We will test the Realtime Feature View by mocking the dependent feature views, which allows us to simulate the feature's performance and validate its logic.
import tecton
import pandas as pd
from datetime import datetime
ws = tecton.get_workspace("prod")
fraud_rtfv = ws.get_feature_view("fraud_detection")
# Create events with mock feature dependencies
events_df = pd.DataFrame(
{
"user_id": ["user_1", "user_2", "user_3"],
"timestamp": [datetime(2023, 4, 2, 10, 0), datetime(2023, 4, 2, 14, 30), datetime(2023, 4, 2, 18, 15)],
"transaction_amount": [500, 1200, 75],
"merchant_category": ["grocery", "electronics", "gas_station"],
# Mock batch feature dependencies - values at each timestamp
"user_spending_metrics__avg_daily_spend": [100, 200, 50],
"user_spending_metrics__transaction_count_7d": [15, 8, 25],
"user_profile__account_age_days": [365, 90, 730],
"merchant_risk_metrics__fraud_rate": [0.02, 0.08, 0.01],
}
)
# Test realtime feature with mocked dependencies
result_df = fraud_rtfv.get_features_for_events(events_df)
print("Fraud detection results:")
display(result_df.to_pandas())
# Verify specific fraud detection logic
for _, row in result_df.to_pandas().iterrows():
risk_score = row["fraud_detection__risk_score"]
amount = row["transaction_amount"]
avg_spend = row["user_spending_metrics__avg_daily_spend"]
# Validate business logic: high risk if 5x above average
expected_high_risk = amount > (avg_spend * 5)
actual_high_risk = risk_score > 0.7
print(
f"User {row['user_id']}: Amount ${amount}, "
f"Avg ${avg_spend}, High Risk: {actual_high_risk} "
f"(Expected: {expected_high_risk})"
)
Expected Output
| user_id | transaction_amount | fraud_detection__risk_score |
|---|---|---|
| user_1 | 500 | 0.8 |
| user_2 | 1200 | 0.9 |
| user_3 | 75 | 0.2 |
User user_1: Amount $500, Avg $100, High Risk: True (Expected: True)
User user_2: Amount $1200, Avg $200, High Risk: True (Expected: True)
User user_3: Amount $75, Avg $50, High Risk: False (Expected: False)