Building Realtime Features with Tecton
Many of the most powerful ML features can only be calculated at the exact moment they're needed. Imagine an e-commerce fraud detection system - when a customer places an order, you might want to check if their shipping address matches their usual location, or if the purchase amount is unusually high compared to their typical spending.
These "realtime features" need to be computed on-the-fly during model inference, either because:
- The data is only available at request time (like the current purchase amount)
- The computation involves comparing request data against historical patterns
- Pre-computing all possible combinations would be impractical or impossible
What You'll Build
In this tutorial, we'll build realtime features for a fraud detection system that can:
- Check if a transaction amount is unusually high
- Compare the transaction against the user's historical spending patterns
- Serve these features with millisecond latency in production
What You'll Learn
You'll learn how to:
- Create realtime features using Python
- Test your features interactively in a notebook
- Combine realtime data with historical user patterns
- Generate training data for your model
- Deploy your features to production
No prior Tecton experience is required, though basic Python knowledge is assumed. Let's get started by setting up our environment!
Prerequisites
Before we dive into building features, let's get our environment set up. You'll need Python >= 3.8 to get started.
1. Install the Required Libraries
Run this command to install the Tecton SDK and supporting libraries:
!pip install 'tecton[rift]==1.0.0' gcsfs s3fs -q
2. Connect to Tecton
Log in to your Tecton account (replace explore.tecton.ai
with your
organization's URL if different):
import tecton
tecton.login("explore.tecton.ai")
3. Import Required Dependencies
Copy these imports - we'll use them throughout the tutorial:
from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
import pandas as pd
# Configure Tecton to use Rift for offline compute
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")
tecton.ai/explore for a free account to try this tutorial.
4. Sample Data
For this tutorial, we'll use a sample transaction dataset that includes:
- Historical transaction amounts
- Transaction timestamps
- User IDs
- Fraud labels
You don't need to download anything - we'll access this data directly from an S3 bucket when needed.
✅ With your environment ready, let's build your first realtime feature!
Part 1: Your First Realtime Feature
Let's start by building a simple but useful feature for fraud detection: identifying high-value transactions that might need extra scrutiny. We'll create a feature that checks if a transaction amount exceeds $1,000.
Defining the Request Data
First, we need to tell Tecton what data we expect to receive at request time. We
do this using a RequestSource
:
# Define the schema for our request data
transaction_request = RequestSource(schema=[Field("amount", Float64)]) # We expect to receive a transaction amount
Creating the Realtime Feature
Now let's create our first realtime feature. We'll write a Python function that takes the transaction amount and returns True if it's over $1,000:
@realtime_feature_view(
sources=[transaction_request], # Use our RequestSource as input
mode="python", # We'll write our transformation in Python
features=[Attribute("transaction_amount_is_high", Bool)], # Our output feature
)
def transaction_amount_is_high(request):
"""Check if a transaction amount is over $1,000."""
return {"transaction_amount_is_high": request["amount"] > 1000}
Let's break down what's happening here:
@realtime_feature_view
tells Tecton this is a realtime featuresources=[transaction_request]
specifies we'll use the request datamode="python"
means we'll write our transformation in Python- Our function takes a
request
parameter containing the input data - We return a dictionary with our feature value
Testing the Feature
Let's test our feature with some sample data:
# Test with a small transaction amount
small_transaction = {"request": {"amount": 182.40}}
print("Small transaction result:")
print(transaction_amount_is_high.run_transformation(input_data=small_transaction))
# Test with a large transaction amount
large_transaction = {"request": {"amount": 1500.00}}
print("\nLarge transaction result:")
print(transaction_amount_is_high.run_transformation(input_data=large_transaction))
You should see output like this:
Small transaction result:
{'transaction_amount_is_high': False}
Large transaction result:
{'transaction_amount_is_high': True}
Great! You've created your first realtime feature. However, a static threshold of $1,000 might not make sense for all users - someone who regularly makes large purchases shouldn't trigger the same alerts as someone who typically makes small transactions.
In the next section, we'll make this feature smarter by comparing the transaction amount to each user's typical spending patterns.
Part 2: Making Features Smarter with Historical Context
Now let's improve our fraud detection by comparing each transaction against the user's historical spending patterns. Instead of using a fixed threshold, we'll check if the transaction amount is unusually high compared to their average transaction amount.
Creating a Historical Feature
First, let's create a Batch Feature View that calculates each user's average transaction amount over the past year:
# Define our data source containing historical transactions
transactions_batch = BatchSource(
name="transactions_batch",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)
# Define our user entity
user = Entity(name="user", join_keys=[Field("user_id", String)])
# Create a feature view that computes the yearly average transaction amount
@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="pandas",
timestamp_field="timestamp",
aggregation_interval=timedelta(days=1),
features=[
Aggregate(
input_column=Field("amount", Float64),
function="mean",
time_window=timedelta(days=365),
name="yearly_average",
),
],
)
def user_transaction_averages(transactions):
"""Calculate the yearly average transaction amount per user."""
return transactions[["user_id", "timestamp", "amount"]]
Combining Real-time and Historical Data
Now let's create an improved realtime feature that compares the current transaction amount against the user's yearly average:
@realtime_feature_view(
sources=[transaction_request, user_transaction_averages], # Current transaction data # Historical averages
mode="python",
features=[Attribute("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_averages):
"""Check if transaction amount exceeds user's yearly average."""
# Get user's average, defaulting to 0 if no history exists
amount_mean = user_transaction_averages["yearly_average"] or 0
current_amount = transaction_request["amount"]
return {"transaction_amount_is_higher_than_average": current_amount > amount_mean}
Testing with Historical Context
Let's test our improved feature with some realistic scenarios:
# Test scenario: Regular user with transaction history
input_data = {"transaction_request": {"amount": 182.40}, "user_transaction_averages": {"yearly_average": 33.46}}
print("Regular user making larger than usual purchase:")
print(transaction_amount_is_higher_than_average.run_transformation(input_data))
# Test scenario: High-value shopper
input_data = {"transaction_request": {"amount": 182.40}, "user_transaction_averages": {"yearly_average": 500.00}}
print("\nHigh-value shopper making typical purchase:")
print(transaction_amount_is_higher_than_average.run_transformation(input_data))
You'll see our feature now adapts to each user's spending patterns:
Regular user making larger than usual purchase:
{'transaction_amount_is_higher_than_average': True}
High-value shopper making typical purchase:
{'transaction_amount_is_higher_than_average': False}
Now we have a smarter feature that understands user context! Next, let's learn how to generate training data and deploy this to production.
Part 3: Getting Ready for Production
Now that we've built and tested our realtime features, let's prepare them for production use. We'll cover how to generate training data, deploy the features, and serve them in production.
Generating Training Data
To train a model with our features, we need to generate historical training data. First, let's create a Feature Service that bundles our features together:
from tecton import FeatureService
fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service",
features=[
user_transaction_averages, # Historical averages
transaction_amount_is_higher_than_average, # Realtime comparison
],
)
Now let's load some historical transaction data with fraud labels:
# Load historical transactions with fraud labels
training_events = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})[
["user_id", "timestamp", "amount", "is_fraud"]
]
# Generate our training dataset
training_data = fraud_detection_feature_service.get_features_for_events(training_events).to_pandas()
print("Training data preview:")
display(training_data.head())
The generated training data includes:
- The original transaction data (amount, user_id, timestamp)
- The fraud labels
- Our computed features (yearly average and comparison)
Deploying to Production
To deploy our features, we need to:
- Copy our feature definitions to a Feature Repository
- Apply them to a live workspace
- Generate an API key for serving
Here's the complete feature repository code:
# feature_repo.py
from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
# [Previous code for BatchSource, Entity, and feature definitions]
# Include all the code we wrote earlier
# Add our Feature Service
fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service",
features=[user_transaction_averages, transaction_amount_is_higher_than_average],
)
Deploy using the Tecton CLI:
tecton workspace create --live fraud-detection
tecton apply
Serving Realtime Features
First, generate a service account API key from the Tecton UI:
- Navigate to Settings > Service Accounts
- Create a new service account
- Save the API key
- Grant the service account "Consumer" access to your workspace
Now we can make realtime feature requests:
import tecton
# Configure credentials
TECTON_API_KEY = "your-api-key" # Replace with your API key
WORKSPACE_NAME = "fraud-detection"
tecton.set_credentials(tecton_api_key=TECTON_API_KEY)
ws = tecton.get_workspace(WORKSPACE_NAME)
fraud_detection_service = ws.get_feature_service("fraud_detection_feature_service")
# Make a feature request
features = fraud_detection_service.get_online_features(
join_keys={"user_id": "user_123"}, request_data={"amount": 750.00}
)
print("\nRealtime feature response:")
print(features.to_dict())
Important Production Notes
-
For best performance in production:
- Use the REST API directly or
- Use Tecton's Python/Java client libraries
- Avoid using
get_online_features()
in production
-
Monitor your features:
- Watch feature freshness in the Tecton UI
- Set up alerts for serving latency
- Track feature distribution changes
That's it! You've successfully built, tested, and deployed realtime features with Tecton.
Wrap-up
Congratulations! You've successfully built production-ready realtime features for fraud detection. Let's recap what you've learned:
What We Built
- A basic realtime feature checking transaction amounts
- A smarter feature that adapts to each user's spending patterns
- A production-ready feature service combining historical and realtime data
Key Concepts Covered
- Using
RequestSource
to define realtime inputs - Creating
realtime_feature_view
s for on-the-fly computations - Combining realtime data with historical features
- Generating training data while maintaining consistency
- Deploying features to production
Next Steps
-
Experiment with your own data:
- Try different aggregation windows for historical patterns
- Add more features like time-of-day or location checks
- Combine multiple historical features
-
Optimize for production:
- Set up proper monitoring
- Configure alerts
- Test performance at scale
-
Dive deeper:
- Explore more complex transformations
- Add feature monitoring
- Implement feature logging
Remember: realtime features in Tecton use the exact same code for training and serving, eliminating the risk of training-serving skew.
Ready to build more? Check out our other tutorials and documentation for more advanced features and best practices!