Skip to main content
Version: Beta ๐Ÿšง

Building Realtime Features with Tecton

Open In Colab

Many of the most powerful ML features can only be calculated at the exact moment they're needed. Imagine an e-commerce fraud detection system - when a customer places an order, you might want to check if their shipping address matches their usual location, or if the purchase amount is unusually high compared to their typical spending.

These "realtime features" need to be computed on-the-fly during model inference, either because:

  • The data is only available at request time (like the current purchase amount)
  • The computation involves comparing request data against historical patterns
  • Pre-computing all possible combinations would be impractical or impossible

What You'll Buildโ€‹

In this tutorial, we'll build realtime features for a fraud detection system that can:

  1. Check if a transaction amount is unusually high
  2. Compare the transaction against the user's historical spending patterns
  3. Serve these features with millisecond latency in production

What You'll Learnโ€‹

You'll learn how to:

  • Create realtime features using Python
  • Test your features interactively in a notebook
  • Combine realtime data with historical user patterns
  • Generate training data for your model
  • Deploy your features to production

Time to Complete: 15-20 minutesโ€‹

No prior Tecton experience is required, though basic Python knowledge is assumed. Let's get started by setting up our environment!

Prerequisitesโ€‹

Before we dive into building features, let's get our environment set up. You'll need Python >= 3.8 to get started.

1. Install the Required Librariesโ€‹

Run this command to install the Tecton SDK and supporting libraries:

!pip install 'tecton[rift]==1.2.0' gcsfs s3fs -q

2. Connect to Tectonโ€‹

Log in to your Tecton account (replace explore.tecton.ai with your organization's URL if different):

import tecton

tecton.login("https://explore.tecton.ai")

3. Import Required Dependenciesโ€‹

Copy these imports - we'll use them throughout the tutorial:

from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
import pandas as pd

# Configure Tecton to use Rift for offline compute
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")
Not yet a Tecton user? Sign up at

tecton.ai/explore for a free account to try this tutorial.

4. Sample Dataโ€‹

For this tutorial, we'll use a sample transaction dataset that includes:

  • Historical transaction amounts
  • Transaction timestamps
  • User IDs
  • Fraud labels

You don't need to download anything - we'll access this data directly from an S3 bucket when needed.

โœ… With your environment ready, let's build your first realtime feature!

Part 1: Your First Realtime Featureโ€‹

Let's start by building a simple but useful feature for fraud detection: identifying high-value transactions that might need extra scrutiny. We'll create a feature that checks if a transaction amount exceeds $1,000.

Defining the Request Dataโ€‹

First, we need to tell Tecton what data we expect to receive at request time. We do this using a RequestSource:

# Define the schema for our request data
# The request from the end user will include a transaction amount
transaction_request = RequestSource(name="transaction_request", schema=[Field("amount", Float64)])

Creating the Realtime Featureโ€‹

Now let's create our first realtime feature to find out if the transaction amount is over $1000. We'll use a calculation feature, which lets us define feature operations using SQL-like expressions:

transaction_amount_is_high = RealtimeFeatureView(
name="transaction_amount_is_high",
sources=[transaction_request], # Use our RequestSource as input
features=[
Calculation(name="transaction_amount_is_high", expr="transaction_request.amount > 1000") # SQL-like expression
],
)

Let's break down what's happening here:

  • RealtimeFeatureView creates a realtime feature
  • sources=[transaction_request] specifies we'll use the request data
  • Calculation defines our feature using a SQL-like expression
  • The expr parameter contains our logic: check if amount is over $1,000

Testing the Featureโ€‹

Let's test our feature with some sample data:

# Test with small and large transaction amounts

import pandas as pd

input_df = pd.DataFrame(
{
"amount": [182.40, 1500.00],
}
)

result_df = transaction_amount_is_high.get_features_for_events(input_df).to_pandas()

# note the "transaction_amount_is_high__transaction_amount_is_high" column with the feature values for the varrying inputs
print(result_df)

You should see output like this:

Small transaction result:
{'transaction_amount_is_high': False}

Large transaction result:
{'transaction_amount_is_high': True}

Great! You've created your first realtime feature. However, a static threshold of $1,000 might not make sense for all users - someone who regularly makes large purchases shouldn't trigger the same alerts as someone who typically makes small transactions.

In the next section, we'll make this feature smarter by comparing the transaction amount to each user's typical spending patterns.

Part 2: Making Features Smarter with Historical Contextโ€‹

Now let's improve our fraud detection by comparing each transaction against the user's historical spending patterns. Instead of using a fixed threshold, we'll check if the transaction amount is unusually high compared to their average transaction amount.

Creating a Historical Featureโ€‹

First, let's create a Batch Feature View that calculates each user's average transaction amount over the past year:

# Define our data source containing historical transactions
transactions_batch = BatchSource(
name="transactions_batch",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)

# Define our user entity
user = Entity(name="user", join_keys=[Field("user_id", String)])

# Create a feature view that computes the yearly average transaction amount
@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="pandas",
timestamp_field="timestamp",
aggregation_interval=timedelta(days=1),
features=[
Aggregate(
input_column=Field("amount", Float64),
function="mean",
time_window=timedelta(days=365),
name="yearly_average",
),
],
)
def user_transaction_averages(transactions):
"""Calculate the yearly average transaction amount per user."""
return transactions[["user_id", "timestamp", "amount"]]

Combining Real-time and Historical Dataโ€‹

Now let's create an improved realtime feature that compares the current transaction amount against the user's yearly average:

transaction_amount_is_higher_than_average = RealtimeFeatureView(
name="transaction_amount_is_higher_than_average",
sources=[transaction_request, user_transaction_averages], # Current transaction data + Historical averages
features=[
Calculation(
name="transaction_amount_is_higher_than_average",
expr="transaction_request.amount > COALESCE(user_transaction_averages.yearly_average, 0)",
)
],
)

Testing with Historical Contextโ€‹

Let's test our improved feature with some realistic scenarios:

# Test scenario: Regular users with transaction history
# Test with the same amount, and small and large average transaction amounts

import pandas as pd

input_df = pd.DataFrame(
{
"amount": [182.40, 182.40],
# we provide the following required columns from user_transaction_averages feature view to mock it out for our tests
"user_transaction_averages__yearly_average": [33.46, 500.00],
"timestamp": [pd.Timestamp("2023-01-01", tz="UTC"), pd.Timestamp("2023-01-01", tz="UTC")],
"user_id": ["user123", "user456"],
}
)

result_df = transaction_amount_is_higher_than_average.get_features_for_events(input_df).to_pandas()

# note the "transaction_amount_is_higher_than_average__transaction_amount_is_higher_than_average" column
# now the feature value differs based on the users yearly average transaction amount
print(result_df)

You'll see our feature now adapts to each user's spending patterns:

Regular user making larger than usual purchase:
{'transaction_amount_is_higher_than_average': True}

High-value shopper making typical purchase:
{'transaction_amount_is_higher_than_average': False}

Now we have a smarter feature that understands user context! Next, let's learn how to generate training data and deploy this to production.

What's Powerful About This?โ€‹

Request-Aware Features in Minutes: You defined a feature that reacts to the incoming transaction amount -- no precomputation, no infrastructure setup. This lets you incorporate request-time context into your model immediately.

Contextual Intelligence from Historical Patterns: By combining request-time data with each user's historical average, you created a feature that adapts to individual behavior instead of relying on static thresholds. This enables more intelligent, personalized decisions.

Fast, Flexible Iteration: You tested both features directly in your notebook, using just Python and sample inputs. No deployment or materialization required, making it easy to explore different ideas quickly.

Part 3: Getting Ready for Productionโ€‹

Now that we've built and tested our realtime features, let's prepare them for production use. We'll cover how to generate training data, deploy the features, and serve them in production.

Generating Training Dataโ€‹

To train a model with our features, we need to generate historical training data. First, let's create a Feature Service that bundles our features together:

from tecton import FeatureService

fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service",
features=[
user_transaction_averages, # Historical averages
transaction_amount_is_higher_than_average, # Realtime comparison
],
)

Now let's load some historical transaction data with fraud labels. This may take a minute to run:

# Load historical transactions
transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})[
["user_id", "timestamp", "amount", "transaction_id"]
]

# Load fraud labels dataset
training_labels = pd.read_parquet("s3://tecton.ai.public/tutorials/labels.pq", storage_options={"anon": True})[
["transaction_id", "is_fraud"]
]

# Join labels to transactions to create training events
training_events = training_labels.merge(transactions_df, on=["transaction_id"], how="left")[
["user_id", "timestamp", "amount", "is_fraud"]
]

training_data = fraud_detection_feature_service.get_features_for_events(training_events).to_pandas()

print("Training data preview:")
display(training_data.head())

The generated training data includes:

  • The original transaction data (amount, user_id, timestamp)
  • The fraud labels
  • Our computed features (yearly average and comparison)

Deploying to Productionโ€‹

To deploy our features, we need to:

  1. Copy our feature definitions to a Feature Repository
  2. Apply them to a live workspace
  3. Generate an API key for serving

Here's the complete feature repository code:

# feature_repo.py

import tecton
from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
import pandas as pd

# Define the schema for our request data
transaction_request = RequestSource(
name="transaction_request", schema=[Field("amount", Float64)]
) # We expect to receive a transaction amount


transaction_amount_is_high = RealtimeFeatureView(
name="transaction_amount_is_high",
sources=[transaction_request], # Use our RequestSource as input
features=[
Calculation(name="transaction_amount_is_high", expr="transaction_request.amount > 1000") # SQL-like expression
],
)


# Define our data source containing historical transactions
transactions_batch = BatchSource(
name="transactions_batch",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)


user = Entity(name="user", join_keys=[Field("user_id", String)])


@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="pandas",
timestamp_field="timestamp",
aggregation_interval=timedelta(days=1),
features=[
Aggregate(
input_column=Field("amount", Float64),
function="mean",
time_window=timedelta(days=365),
name="yearly_average",
),
],
)
def user_transaction_averages(transactions):
"""Calculate the yearly average transaction amount per user."""
return transactions[["user_id", "timestamp", "amount"]]


transaction_amount_is_higher_than_average = RealtimeFeatureView(
name="transaction_amount_is_higher_than_average",
sources=[transaction_request, user_transaction_averages], # Current transaction data + Historical averages
features=[
Calculation(
name="transaction_amount_is_higher_than_average",
expr="transaction_request.amount > COALESCE(user_transaction_averages.yearly_average, 0.0)",
)
],
)


fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service",
features=[user_transaction_averages, transaction_amount_is_higher_than_average],
)

Deploy using the Tecton CLI:

tecton workspace create --live fraud-detection
tecton apply

Serving Realtime Featuresโ€‹

First, generate a service account API key from the Tecton UI:

  1. Navigate to Settings > Service Accounts
  2. Create a new service account
  3. Save the API key
  4. Grant the service account "Consumer" access to your workspace

Now we can make realtime feature requests:

import tecton

# Configure credentials
TECTON_API_KEY = "your-api-key" # Replace with your API key
WORKSPACE_NAME = "fraud-detection"

tecton.login(tecton_url="https://example.tecton.ai", tecton_api_key=TECTON_API_KEY)
ws = tecton.get_workspace(WORKSPACE_NAME)
fraud_detection_service = ws.get_feature_service("fraud_detection_feature_service")

# Make a feature request
features = fraud_detection_service.get_online_features(
join_keys={"user_id": "user_123"}, request_data={"amount": 750.00}
)

print("\nRealtime feature response:")
print(features.to_dict())

Important Production Notesโ€‹

  1. For best performance in production:

    • Use the REST API directly or
    • Use Tecton's Python/Java client libraries
    • Avoid using get_online_features() in production
  2. Monitor your features:

    • Watch feature freshness in the Tecton UI
    • Set up alerts for serving latency
    • Track feature distribution changes

That's it! You've successfully built, tested, and deployed realtime features with Tecton.

Wrap-upโ€‹

Congratulations! You've successfully built production-ready realtime features for fraud detection. Let's recap what you've learned:

What We Builtโ€‹

  • A basic realtime feature checking transaction amounts
  • A smarter feature that adapts to each user's spending patterns
  • A production-ready feature service combining historical and realtime data

Key Concepts Coveredโ€‹

  • Using RequestSource to define realtime inputs
  • Creating realtime_feature_views for on-the-fly computations
  • Combining realtime data with historical features
  • Generating training data while maintaining consistency
  • Deploying features to production

Next Stepsโ€‹

  1. Dive deeper:

    Realtime Feature Views aren't limited to Calculation expressions. You have access to the full flexibility of python using Transformation functions too. Read more about Realtime Feature Views here.

  2. Experiment with your own data:

    • Try different aggregation windows for historical patterns
    • Add more features like time-of-day or location checks
    • Combine multiple historical features
  3. Optimize for production:

    • Set up proper monitoring
    • Configure alerts
    • Test performance at scale

Remember: realtime features in Tecton use the exact same code for training and serving, eliminating the risk of training-serving skew.

Ready to build more? Check out our other tutorials and documentation for more advanced features and best practices!

Was this page helpful?