Version: 1.1

Building Realtime Features with Tecton

Many of the most powerful ML features can only be calculated at the exact moment they're needed. Imagine an e-commerce fraud detection system - when a customer places an order, you might want to check if their shipping address matches their usual location, or if the purchase amount is unusually high compared to their typical spending.

These "realtime features" need to be computed on-the-fly during model inference, either because:

The data is only available at request time (like the current purchase amount)
The computation involves comparing request data against historical patterns
Pre-computing all possible combinations would be impractical or impossible

What You'll Build

In this tutorial, we'll build realtime features for a fraud detection system that can:

Check if a transaction amount is unusually high
Compare the transaction against the user's historical spending patterns
Serve these features with millisecond latency in production

What You'll Learn

You'll learn how to:

Create realtime features using Python
Test your features interactively in a notebook
Combine realtime data with historical user patterns
Generate training data for your model
Deploy your features to production

No prior Tecton experience is required, though basic Python knowledge is assumed. Let's get started by setting up our environment!

Prerequisites

Before we dive into building features, let's get our environment set up. You'll need Python >= 3.8 to get started.

1. Install the Required Libraries

Run this command to install the Tecton SDK and supporting libraries:

!pip install 'tecton[rift]==1.0.0' gcsfs s3fs -q

2. Connect to Tecton

import tecton

tecton.login("explore.tecton.ai")

3. Import Required Dependencies

Copy these imports - we'll use them throughout the tutorial:

from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
import pandas as pd

# Configure Tecton to use Rift for offline compute
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")

Not yet a Tecton user? Sign up at

tecton.ai/explore for a free account to try this tutorial.

4. Sample Data

For this tutorial, we'll use a sample transaction dataset that includes:

Historical transaction amounts
Transaction timestamps
User IDs
Fraud labels

You don't need to download anything - we'll access this data directly from an S3 bucket when needed.

✅ With your environment ready, let's build your first realtime feature!

Part 1: Your First Realtime Feature

Let's start by building a simple but useful feature for fraud detection: identifying high-value transactions that might need extra scrutiny. We'll create a feature that checks if a transaction amount exceeds $1,000.

Defining the Request Data

First, we need to tell Tecton what data we expect to receive at request time. We do this using a RequestSource:

# Define the schema for our request data
transaction_request = RequestSource(schema=[Field("amount", Float64)])  # We expect to receive a transaction amount

Creating the Realtime Feature

Now let's create our first realtime feature. We'll write a Python function that takes the transaction amount and returns True if it's over $1,000:

@realtime_feature_view(
    sources=[transaction_request],  # Use our RequestSource as input
    mode="python",  # We'll write our transformation in Python
    features=[Attribute("transaction_amount_is_high", Bool)],  # Our output feature
)
def transaction_amount_is_high(request):
    """Check if a transaction amount is over $1,000."""
    return {"transaction_amount_is_high": request["amount"] > 1000}

Let's break down what's happening here:

@realtime_feature_view tells Tecton this is a realtime feature
sources=[transaction_request] specifies we'll use the request data
mode="python" means we'll write our transformation in Python
Our function takes a request parameter containing the input data
We return a dictionary with our feature value

Testing the Feature

Let's test our feature with some sample data:

# Test with a small transaction amount
small_transaction = {"request": {"amount": 182.40}}
print("Small transaction result:")
print(transaction_amount_is_high.run_transformation(input_data=small_transaction))

# Test with a large transaction amount
large_transaction = {"request": {"amount": 1500.00}}
print("\nLarge transaction result:")
print(transaction_amount_is_high.run_transformation(input_data=large_transaction))

You should see output like this:

Small transaction result:
{'transaction_amount_is_high': False}

Large transaction result:
{'transaction_amount_is_high': True}

Great! You've created your first realtime feature. However, a static threshold of $1,000 might not make sense for all users - someone who regularly makes large purchases shouldn't trigger the same alerts as someone who typically makes small transactions.

In the next section, we'll make this feature smarter by comparing the transaction amount to each user's typical spending patterns.

Part 2: Making Features Smarter with Historical Context

Now let's improve our fraud detection by comparing each transaction against the user's historical spending patterns. Instead of using a fixed threshold, we'll check if the transaction amount is unusually high compared to their average transaction amount.

Creating a Historical Feature

First, let's create a Batch Feature View that calculates each user's average transaction amount over the past year:

# Define our data source containing historical transactions
transactions_batch = BatchSource(
    name="transactions_batch",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/transactions.pq",
        file_format="parquet",
        timestamp_field="timestamp",
    ),
)

# Define our user entity
user = Entity(name="user", join_keys=[Field("user_id", String)])

# Create a feature view that computes the yearly average transaction amount
@batch_feature_view(
    sources=[transactions_batch],
    entities=[user],
    mode="pandas",
    timestamp_field="timestamp",
    aggregation_interval=timedelta(days=1),
    features=[
        Aggregate(
            input_column=Field("amount", Float64),
            function="mean",
            time_window=timedelta(days=365),
            name="yearly_average",
        ),
    ],
)
def user_transaction_averages(transactions):
    """Calculate the yearly average transaction amount per user."""
    return transactions[["user_id", "timestamp", "amount"]]

Combining Real-time and Historical Data

Now let's create an improved realtime feature that compares the current transaction amount against the user's yearly average:

@realtime_feature_view(
    sources=[transaction_request, user_transaction_averages],  # Current transaction data  # Historical averages
    mode="python",
    features=[Attribute("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_averages):
    """Check if transaction amount exceeds user's yearly average."""
    # Get user's average, defaulting to 0 if no history exists
    amount_mean = user_transaction_averages["yearly_average"] or 0
    current_amount = transaction_request["amount"]

    return {"transaction_amount_is_higher_than_average": current_amount > amount_mean}

Testing with Historical Context

Let's test our improved feature with some realistic scenarios:

# Test scenario: Regular user with transaction history
input_data = {"transaction_request": {"amount": 182.40}, "user_transaction_averages": {"yearly_average": 33.46}}

print("Regular user making larger than usual purchase:")
print(transaction_amount_is_higher_than_average.run_transformation(input_data))

# Test scenario: High-value shopper
input_data = {"transaction_request": {"amount": 182.40}, "user_transaction_averages": {"yearly_average": 500.00}}

print("\nHigh-value shopper making typical purchase:")
print(transaction_amount_is_higher_than_average.run_transformation(input_data))

You'll see our feature now adapts to each user's spending patterns:

Regular user making larger than usual purchase:
{'transaction_amount_is_higher_than_average': True}

High-value shopper making typical purchase:
{'transaction_amount_is_higher_than_average': False}

Now we have a smarter feature that understands user context! Next, let's learn how to generate training data and deploy this to production.

Part 3: Getting Ready for Production

Now that we've built and tested our realtime features, let's prepare them for production use. We'll cover how to generate training data, deploy the features, and serve them in production.

Generating Training Data

To train a model with our features, we need to generate historical training data. First, let's create a Feature Service that bundles our features together:

from tecton import FeatureService

fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service",
    features=[
        user_transaction_averages,  # Historical averages
        transaction_amount_is_higher_than_average,  # Realtime comparison
    ],
)

Now let's load some historical transaction data with fraud labels:

# Load historical transactions with fraud labels
training_events = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})[
    ["user_id", "timestamp", "amount", "is_fraud"]
]

# Generate our training dataset
training_data = fraud_detection_feature_service.get_features_for_events(training_events).to_pandas()

print("Training data preview:")
display(training_data.head())

The generated training data includes:

The original transaction data (amount, user_id, timestamp)
The fraud labels
Our computed features (yearly average and comparison)

Deploying to Production

To deploy our features, we need to:

Copy our feature definitions to a Feature Repository
Apply them to a live workspace
Generate an API key for serving

Here's the complete feature repository code:

# feature_repo.py

from tecton import *
from tecton.types import *
from datetime import datetime, timedelta

# [Previous code for BatchSource, Entity, and feature definitions]
# Include all the code we wrote earlier

# Add our Feature Service
fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service",
    features=[user_transaction_averages, transaction_amount_is_higher_than_average],
)

Deploy using the Tecton CLI:

tecton workspace create --live fraud-detection
tecton apply

Serving Realtime Features

First, generate a service account API key from the Tecton UI:

Navigate to Settings > Service Accounts
Create a new service account
Save the API key
Grant the service account "Consumer" access to your workspace

Now we can make realtime feature requests:

import tecton

# Configure credentials
TECTON_API_KEY = "your-api-key"  # Replace with your API key
WORKSPACE_NAME = "fraud-detection"

tecton.set_credentials(tecton_api_key=TECTON_API_KEY)
ws = tecton.get_workspace(WORKSPACE_NAME)
fraud_detection_service = ws.get_feature_service("fraud_detection_feature_service")

# Make a feature request
features = fraud_detection_service.get_online_features(
    join_keys={"user_id": "user_123"}, request_data={"amount": 750.00}
)

print("\nRealtime feature response:")
print(features.to_dict())

Important Production Notes

For best performance in production:
- Use the REST API directly or
- Use Tecton's Python/Java client libraries
- Avoid using get_online_features() in production
Monitor your features:
- Watch feature freshness in the Tecton UI
- Set up alerts for serving latency
- Track feature distribution changes

That's it! You've successfully built, tested, and deployed realtime features with Tecton.

Wrap-up

Congratulations! You've successfully built production-ready realtime features for fraud detection. Let's recap what you've learned:

What We Built

A basic realtime feature checking transaction amounts
A smarter feature that adapts to each user's spending patterns
A production-ready feature service combining historical and realtime data

Key Concepts Covered

Using RequestSource to define realtime inputs
Creating realtime_feature_views for on-the-fly computations
Combining realtime data with historical features
Generating training data while maintaining consistency
Deploying features to production

Next Steps

Experiment with your own data:
- Try different aggregation windows for historical patterns
- Add more features like time-of-day or location checks
- Combine multiple historical features
Optimize for production:
- Set up proper monitoring
- Configure alerts
- Test performance at scale
Dive deeper:
- Explore more complex transformations
- Add feature monitoring
- Implement feature logging

Remember: realtime features in Tecton use the exact same code for training and serving, eliminating the risk of training-serving skew.

Ready to build more? Check out our other tutorials and documentation for more advanced features and best practices!

What You'll Build​

What You'll Learn​

Prerequisites​

1. Install the Required Libraries​

2. Connect to Tecton​

3. Import Required Dependencies​

4. Sample Data​

Part 1: Your First Realtime Feature​

Defining the Request Data​

Creating the Realtime Feature​

Testing the Feature​

Part 2: Making Features Smarter with Historical Context​

Creating a Historical Feature​

Combining Real-time and Historical Data​

Testing with Historical Context​

Part 3: Getting Ready for Production​

Generating Training Data​

Deploying to Production​

Serving Realtime Features​

Important Production Notes​

Wrap-up​

What We Built​

Key Concepts Covered​

Next Steps​

Was this page helpful?