Skip to main content
Version: 1.2

Testing Realtime Features

Overviewโ€‹

Testing realtime features involves validating both the transformation logic and the interaction with batch/stream feature dependencies. Realtime features can be tested using mock data to simulate both request data and feature dependencies.

Testing Methodsโ€‹

Realtime features support multiple testing approaches:

1. Using get_features_for_events() with mock inputsโ€‹

The get_features_for_events() method allows you to test offline retrieval of realtime features by providing mock feature values directly in the events DataFrame.

Unlike batch and stream feature views that use a mock_inputs parameter, Realtime Feature Views mock request sources and dependent feature values directly in the events data frame. This allows you to specify different mocked values for each timestamp in the events DataFrame, enabling point-in-time testing scenarios.

tip

If a dependent feature column is not mocked, Tecton will try to compute it from the raw data source source or read it from the offline store. If you would like to purely use mock inputs and not read from a data source, you should mock every feature column in the dependent feature view.

Learn more about mocking dependent features and request sources

# Create events DataFrame with mock feature dependencies
# This RTFV depends on user_spending_metrics and user_profile feature views
events_df = pd.DataFrame(
{
"user_id": ["user_1", "user_2"],
"timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
"transaction_amount": [500, 750],
# Mock batch feature dependencies - these represent the dependent feature values
# that would be available at the timestamp for each user
"user_spending_metrics__avg_daily_spend": [100, 200],
"user_profile__account_tier": ["premium", "basic"],
}
)

# Test realtime feature
result_df = rtfv.get_features_for_events(events_df)
Result
user_idtimestamptransaction_amountfraud_detection__is_high_riskfraud_detection__tier_multiplier
user_12023-04-02 00:00:0050011.5
user_22023-04-03 00:00:0075011.0

2. Using run_transformation() with mock input dataโ€‹

For realtime feature views with transformations where you would like to test the results of a single invocation of the RTFV, use run_transformation() with mock input data:

# Mock all feature inputs
input_data = {
"transaction_request": pd.DataFrame([{"amount": 100.0, "merchant_category": "grocery"}]),
"user_spending_metrics": pd.DataFrame([{"avg_spend_30d": 200.0, "transaction_count_7d": 15}]),
}

result = rtfv.run_transformation(input_data=input_data)
print(result) # {"is_spending_anomaly": 0}

Mocking Dependent Features and Request Sourcesโ€‹

When testing realtime features with get_features_for_events(), you can mock both dependent feature values and request source data directly in the events DataFrame. This allows you to test your realtime transformation logic without needing to materialize dependent features.

Mocking Request Sourcesโ€‹

Request source data uses the original column names as defined in your RequestSource schema:

# For a RequestSource with fields: transaction_amount, merchant_category
events_df = pd.DataFrame(
{
"user_id": ["user_1", "user_2"],
"timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
# Request source columns - use original names
"transaction_amount": [500, 750],
"merchant_category": ["grocery", "electronics"],
}
)

Mocking Dependent Feature Valuesโ€‹

Batch and stream feature dependencies use the naming convention {feature_view_name}__{feature_name}:

These mock columns simulate the exact feature values that would be passed to the realtime feature view at the point-in-time specified by the timestamp in each row.

# Mock dependent features from other feature views
events_df = pd.DataFrame(
{
"user_id": ["user_1", "user_2"],
"timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
# Request source data
"transaction_amount": [500, 750],
# Dependent feature values - use feature_view__feature_name format
"user_spending_metrics__avg_daily_spend": [100, 200],
"user_profile__account_tier": ["premium", "basic"],
"merchant_risk_metrics__fraud_rate": [0.02, 0.08],
}
)

Column Naming Convention Summaryโ€‹

  • Request Sources: Use original column names (transaction_amount, merchant_category, etc.)
  • Dependent Features: Use {feature_view_name}__{feature_name} format

Point-in-Time Semanticsโ€‹

Each row in the events DataFrame represents a specific point in time when the realtime feature would be computed. The mock feature columns provide the exact values that would be available at that timestamp:

events_df = pd.DataFrame(
{
"user_id": ["user_123", "user_123"],
"timestamp": [
datetime(2023, 4, 2, 10, 0), # Morning computation
datetime(2023, 4, 2, 18, 0), # Evening computation
],
"transaction_amount": [100, 150],
# These represent the user's daily average at each timestamp
"user_spending_metrics__avg_daily_spend": [50, 75], # Average increased throughout the day
}
)

Example: Testing Dependency Mocking for a Fraud Detection Featureโ€‹

This example demonstrates how to test a realtime fraud detection feature, which analyzes transaction data to determine the likelihood of fraudulent activity. It utilizes dependent feature views, such as user spending metrics and merchant risk metrics, to generate a risk score for each transaction.

We will test the Realtime Feature View by mocking the dependent feature views, which allows us to simulate the feature's performance and validate its logic.

import tecton
import pandas as pd
from datetime import datetime

ws = tecton.get_workspace("prod")
fraud_rtfv = ws.get_feature_view("fraud_detection")

# Create events with mock feature dependencies
events_df = pd.DataFrame(
{
"user_id": ["user_1", "user_2", "user_3"],
"timestamp": [datetime(2023, 4, 2, 10, 0), datetime(2023, 4, 2, 14, 30), datetime(2023, 4, 2, 18, 15)],
"transaction_amount": [500, 1200, 75],
"merchant_category": ["grocery", "electronics", "gas_station"],
# Mock batch feature dependencies - values at each timestamp
"user_spending_metrics__avg_daily_spend": [100, 200, 50],
"user_spending_metrics__transaction_count_7d": [15, 8, 25],
"user_profile__account_age_days": [365, 90, 730],
"merchant_risk_metrics__fraud_rate": [0.02, 0.08, 0.01],
}
)

# Test realtime feature with mocked dependencies
result_df = fraud_rtfv.get_features_for_events(events_df)
print("Fraud detection results:")
display(result_df.to_pandas())

# Verify specific fraud detection logic
for _, row in result_df.to_pandas().iterrows():
risk_score = row["fraud_detection__risk_score"]
amount = row["transaction_amount"]
avg_spend = row["user_spending_metrics__avg_daily_spend"]

# Validate business logic: high risk if 5x above average
expected_high_risk = amount > (avg_spend * 5)
actual_high_risk = risk_score > 0.7

print(
f"User {row['user_id']}: Amount ${amount}, "
f"Avg ${avg_spend}, High Risk: {actual_high_risk} "
f"(Expected: {expected_high_risk})"
)
Expected Output
user_idtransaction_amountfraud_detection__risk_score
user_15000.8
user_212000.9
user_3750.2
User user_1: Amount $500, Avg $100, High Risk: True (Expected: True)
User user_2: Amount $1200, Avg $200, High Risk: True (Expected: True)
User user_3: Amount $75, Avg $50, High Risk: False (Expected: False)

Was this page helpful?