Version: 1.2

Testing Realtime Features

Overview

Testing realtime features involves validating both the transformation logic and the interaction with batch/stream feature dependencies. Realtime features can be tested using mock data to simulate both request data and feature dependencies.

Testing Methods

Realtime features support multiple testing approaches:

1. Using `get_features_for_events()` with mock inputs

The get_features_for_events() method allows you to test offline retrieval of realtime features by providing mock feature values directly in the events DataFrame.

Unlike batch and stream feature views that use a mock_inputs parameter, Realtime Feature Views mock request sources and dependent feature values directly in the events data frame. This allows you to specify different mocked values for each timestamp in the events DataFrame, enabling point-in-time testing scenarios.

tip

If a dependent feature column is not mocked, Tecton will try to compute it from the raw data source source or read it from the offline store. If you would like to purely use mock inputs and not read from a data source, you should mock every feature column in the dependent feature view.

Learn more about mocking dependent features and request sources

# Create events DataFrame with mock feature dependencies
# This RTFV depends on user_spending_metrics and user_profile feature views
events_df = pd.DataFrame(
    {
        "user_id": ["user_1", "user_2"],
        "timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
        "transaction_amount": [500, 750],
        # Mock batch feature dependencies - these represent the dependent feature values
        # that would be available at the timestamp for each user
        "user_spending_metrics__avg_daily_spend": [100, 200],
        "user_profile__account_tier": ["premium", "basic"],
    }
)

# Test realtime feature
result_df = rtfv.get_features_for_events(events_df)

Result

user_id	timestamp	transaction_amount	fraud_detection__is_high_risk	fraud_detection__tier_multiplier
user_1	2023-04-02 00:00:00	500	1	1.5
user_2	2023-04-03 00:00:00	750	1	1.0

2. Using `run_transformation()` with mock input data

For realtime feature views with transformations where you would like to test the results of a single invocation of the RTFV, use run_transformation() with mock input data:

# Mock all feature inputs
input_data = {
    "transaction_request": pd.DataFrame([{"amount": 100.0, "merchant_category": "grocery"}]),
    "user_spending_metrics": pd.DataFrame([{"avg_spend_30d": 200.0, "transaction_count_7d": 15}]),
}

result = rtfv.run_transformation(input_data=input_data)
print(result)  # {"is_spending_anomaly": 0}

Mocking Dependent Features and Request Sources

When testing realtime features with get_features_for_events(), you can mock both dependent feature values and request source data directly in the events DataFrame. This allows you to test your realtime transformation logic without needing to materialize dependent features.

Mocking Request Sources

Request source data uses the original column names as defined in your RequestSource schema:

# For a RequestSource with fields: transaction_amount, merchant_category
events_df = pd.DataFrame(
    {
        "user_id": ["user_1", "user_2"],
        "timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
        # Request source columns - use original names
        "transaction_amount": [500, 750],
        "merchant_category": ["grocery", "electronics"],
    }
)

Mocking Dependent Feature Values

Batch and stream feature dependencies use the naming convention {feature_view_name}__{feature_name}:

These mock columns simulate the exact feature values that would be passed to the realtime feature view at the point-in-time specified by the timestamp in each row.

# Mock dependent features from other feature views
events_df = pd.DataFrame(
    {
        "user_id": ["user_1", "user_2"],
        "timestamp": [datetime(2023, 4, 2), datetime(2023, 4, 3)],
        # Request source data
        "transaction_amount": [500, 750],
        # Dependent feature values - use feature_view__feature_name format
        "user_spending_metrics__avg_daily_spend": [100, 200],
        "user_profile__account_tier": ["premium", "basic"],
        "merchant_risk_metrics__fraud_rate": [0.02, 0.08],
    }
)

Column Naming Convention Summary

Request Sources: Use original column names (transaction_amount, merchant_category, etc.)
Dependent Features: Use {feature_view_name}__{feature_name} format

Point-in-Time Semantics

Each row in the events DataFrame represents a specific point in time when the realtime feature would be computed. The mock feature columns provide the exact values that would be available at that timestamp:

events_df = pd.DataFrame(
    {
        "user_id": ["user_123", "user_123"],
        "timestamp": [
            datetime(2023, 4, 2, 10, 0),  # Morning computation
            datetime(2023, 4, 2, 18, 0),  # Evening computation
        ],
        "transaction_amount": [100, 150],
        # These represent the user's daily average at each timestamp
        "user_spending_metrics__avg_daily_spend": [50, 75],  # Average increased throughout the day
    }
)

Example: Testing Dependency Mocking for a Fraud Detection Feature

This example demonstrates how to test a realtime fraud detection feature, which analyzes transaction data to determine the likelihood of fraudulent activity. It utilizes dependent feature views, such as user spending metrics and merchant risk metrics, to generate a risk score for each transaction.

We will test the Realtime Feature View by mocking the dependent feature views, which allows us to simulate the feature's performance and validate its logic.

import tecton
import pandas as pd
from datetime import datetime

ws = tecton.get_workspace("prod")
fraud_rtfv = ws.get_feature_view("fraud_detection")

# Create events with mock feature dependencies
events_df = pd.DataFrame(
    {
        "user_id": ["user_1", "user_2", "user_3"],
        "timestamp": [datetime(2023, 4, 2, 10, 0), datetime(2023, 4, 2, 14, 30), datetime(2023, 4, 2, 18, 15)],
        "transaction_amount": [500, 1200, 75],
        "merchant_category": ["grocery", "electronics", "gas_station"],
        # Mock batch feature dependencies - values at each timestamp
        "user_spending_metrics__avg_daily_spend": [100, 200, 50],
        "user_spending_metrics__transaction_count_7d": [15, 8, 25],
        "user_profile__account_age_days": [365, 90, 730],
        "merchant_risk_metrics__fraud_rate": [0.02, 0.08, 0.01],
    }
)

# Test realtime feature with mocked dependencies
result_df = fraud_rtfv.get_features_for_events(events_df)
print("Fraud detection results:")
display(result_df.to_pandas())

# Verify specific fraud detection logic
for _, row in result_df.to_pandas().iterrows():
    risk_score = row["fraud_detection__risk_score"]
    amount = row["transaction_amount"]
    avg_spend = row["user_spending_metrics__avg_daily_spend"]

    # Validate business logic: high risk if 5x above average
    expected_high_risk = amount > (avg_spend * 5)
    actual_high_risk = risk_score > 0.7

    print(
        f"User {row['user_id']}: Amount ${amount}, "
        f"Avg ${avg_spend}, High Risk: {actual_high_risk} "
        f"(Expected: {expected_high_risk})"
    )

Expected Output

user_id	transaction_amount	fraud_detection__risk_score
user_1	500	0.8
user_2	1200	0.9
user_3	75	0.2

User user_1: Amount $500, Avg $100, High Risk: True (Expected: True)
User user_2: Amount $1200, Avg $200, High Risk: True (Expected: True)
User user_3: Amount $75, Avg $50, High Risk: False (Expected: False)

Overview​

Testing Methods​

1. Using get_features_for_events() with mock inputs​

2. Using run_transformation() with mock input data​

Mocking Dependent Features and Request Sources​

Mocking Request Sources​

Mocking Dependent Feature Values​

Column Naming Convention Summary​

Point-in-Time Semantics​

Example: Testing Dependency Mocking for a Fraud Detection Feature​

Was this page helpful?