⏱️ Building On-Demand Features
Many critical features for real-time models can only be calculated at the time of a request, either because:
- They require data that is only available at request time (e.g. a user's current location)
- They can't efficiently be pre-computed (e.g. computing the embedding similarity between all possible users)
Running transformations at request time can also be useful for:
- Post-processing feature data (example: imputing null values)
- Running additional transformations after Tecton-managed aggregations
- Defining new features without needing to rematerialize Feature Store data
For more details, see On-Demand Feature Views.
This is where "On-Demand" features come in. In Tecton, an On-Demand Feature View let's you calculate features at the time of a request, using either data passed in with the request or pre-computed batch and stream features.
This tutorial will show how you can develop, test, and productionize on-demand features for real-time models. This tutorial is centered around a fraud detection use case, where we need to predict in real-time whether a transaction that a user is making is fraudulent.
This tutorial assumes some basic familiarity with Tecton. If you are new to Tecton, we recommend first checking out Building a Production AI Application with Tecton which walks through an end-to-end journey of building a real-time ML application with Tecton.
⚙️ Install Pre-Reqs
First things first, let's install the Tecton SDK and other libraries used by this tutorial (we recommend in a virtual environment) using:
!pip install 'tecton[rift]==0.9.0' gcsfs s3fs -q
After installing, run the following command to log in to your organization's Tecton account. Be sure to use your own account name.
Note: You need to press enter
after pasting in your authentication code.
import tecton
tecton.login("explore.tecton.ai") # replace with your URL
Let's then run some basic imports and setup that we will use later in the tutorial.
from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
from pprint import pprint
import pandas as pd
tecton.set_validation_mode("auto")
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")
👩💻 Create an on-demand feature that leverages request data
Let's say that for our fraud detection model, we want to be able to leverage information about the user's current transaction that we are evaluating. We only have access to that information at the time of evaluation so any features derived from current transaction information need to be computed in real-time.
On-Demand Feature Views are able to leverage real-time request data for building features. In this case, we will do a very simple check to see if the current transaction amount is over $1000. This is a pretty basic feature, but in the next section we will look at how to make it better!
To define an on-demand feature that leverages request data, we first define a Request Source. The Request Source specifies the expected schema for the data that will be passed in with the request.
When using mode='python'
the inputs and outputs of the On-Demand Feature View
are dictionaries.
For more information on modes
in On Demand Feature Views see
On-Demand Feature Views.
transaction_request = RequestSource(schema=[Field("amount", Float64)])
@on_demand_feature_view(
sources=[transaction_request],
mode="python",
schema=[Field("transaction_amount_is_high", Bool)],
)
def transaction_amount_is_high(transaction_request):
return {"transaction_amount_is_high": transaction_request["amount"] > 1000}
Now that we've defined our feature, we can test it out with some mock data using
.run()
.
request = {"amount": 182.4}
transaction_amount_is_high.run(transaction_request=request)
Out:
{'transaction_amount_is_high': False}
🔗 Create an on-demand feature that leverages request data and other features
This feature is okay, but wouldn't it be much better if we could compare the transaction amount to the user's historical average?
On-Demand Feature Views also have the ability to depend on Batch and Stream Feature Views as input data sources. We can use this capability to improve our feature. Let's take a look.
First we will create a Batch Feature View that computes the user's 1-year average transaction amount. Then we will add this as a source in a new On-Demand Feature View with an updated feature transformation.
transactions_batch = BatchSource(
name="transactions_batch",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)
user = Entity(name="user", join_keys=["user_id"])
@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="pandas",
aggregation_interval=timedelta(days=1),
aggregations=[
Aggregation(function="mean", column="amount", time_window=timedelta(days=365), name="yearly_average"),
],
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amount"]]
transaction_request = RequestSource(schema=[Field("amount", Float64)])
@on_demand_feature_view(
sources=[transaction_request, user_transaction_averages],
mode="python",
schema=[Field("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_averages):
amount_mean = user_transaction_averages["yearly_average"] or 0
return {"transaction_amount_is_higher_than_average": transaction_request["amount"] > amount_mean}
We can again test our new feature using .run()
and passing in example data.
averages = {"yearly_average": 33.46}
request = {"amount": 182.4}
transaction_amount_is_higher_than_average.run(user_transaction_averages=averages, transaction_request=request)
Out:
{'transaction_amount_is_higher_than_average': True}
Great! Now that this feature looks to be doing what we want, let's see how we can generate training data with it.
🧮 Generating Training Data with On-Demand Features
When generating training datasets for on-demand features, Tecton uses the exact same transformation logic as it does online to eliminate online/offline skew.
The Python function you defined will be executed as a UDF on the training data set.
To see this in action, we will first load up a set of historical training events.
Tecton expects that any request data passed in online is present in the set of
historical training events. In our example below, this is represented by the
amount
column.
# Retrieve our dataset of historical transaction data
transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})
# Retrieve our dataset of labels containing transaction_id and is_fraud (set to 1 if the transaction is fraudulent or 0 otherwise)
training_labels = pd.read_parquet("s3://tecton.ai.public/tutorials/labels.pq", storage_options={"anon": True})
# Join our label dataset to our transaction data to produce a list of training events
training_events = training_labels.merge(transactions_df, on=["transaction_id"], how="left")[
["user_id", "timestamp", "amount", "is_fraud"]
]
display(training_events.head(5))
user_id | timestamp | amount | is_fraud | |
---|---|---|---|---|
0 | user_5120258459 | 2021-01-01 00:12:17.950000 | 732.27 | 0 |
1 | user_8873190199 | 2021-01-01 00:14:23.411000 | 56.14 | 0 |
2 | user_4389585068 | 2021-01-01 00:16:39.189000 | 514.87 | 0 |
3 | user_5117507286 | 2021-01-01 00:41:32.604000 | 43.85 | 0 |
4 | user_2862609228 | 2021-01-01 00:45:22.095000 | 50.74 | 0 |
Now we can add our On-Demand Feature View to a Feature Service and generate training data for these historical events.
We included the dependent Batch Feature View in the Feature Service as well to visualize the data better, but it is not necessary to include.
from tecton import FeatureService
fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service",
features=[user_transaction_averages, transaction_amount_is_higher_than_average],
)
training_data = fraud_detection_feature_service.get_historical_features(training_events).to_pandas().fillna(0)
display(training_data.head(5))
user_id | timestamp | amount | is_fraud | user_transaction_averages__yearly_average | transaction_amount_is_higher_than_average__transaction_amount_is_higher_than_average | |
---|---|---|---|---|---|---|
0 | user_1203218114 | 2023-05-03 15:01:55.826000 | 107.98 | 0 | 0 | True |
1 | user_1739270457 | 2023-10-15 07:20:30.640000 | 21.44 | 0 | 0 | True |
2 | user_1739270457 | 2024-04-23 14:44:46.515000 | 27.1 | 0 | 0 | True |
3 | user_1739270457 | 2025-01-01 00:06:10.014000 | 731.6 | 1 | 0 | True |
4 | user_1739270457 | 2025-01-01 00:04:13.014000 | 1.88 | 1 | 0 | True |
We can use this training data set to train an accurate model with our new feature.
🚀 Run on-demand features in production
Once we are happy with our On-Demand Feature View we can copy the definitions into our Feature Repository and apply our changes to a live workspace using the Tecton CLI.
For more information on working with Feature Repositories or applying changes to workspaces, check out the Quick Start tutorial or Feature Development Workflow pages.
We've also included the Batch Feature View dependency and the Feature Service in the file below.
feature_repo.py
from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
transactions_batch = BatchSource(
name="transactions_batch",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)
user = Entity(name="user", join_keys=["user_id"])
@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="pandas",
aggregation_interval=timedelta(days=1),
aggregations=[
Aggregation(function="mean", column="amount", time_window=timedelta(days=365), name="yearly_average"),
],
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)],
online=True,
offline=True,
feature_start_time=datetime(2023, 1, 1),
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amount"]]
transaction_request = RequestSource(schema=[Field("amount", Float64)])
@on_demand_feature_view(
sources=[transaction_request, user_transaction_averages],
mode="python",
schema=[Field("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_averages):
amount_mean = user_transaction_averages["yearly_average"] or 0
return {"transaction_amount_is_higher_than_average": transaction_request["amount"] > amount_mean}
fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service",
features=[user_transaction_averages, transaction_amount_is_higher_than_average],
)
✅ Run the following commands in your terminal to create a live workspace and apply your changes:
tecton login [your-org-account-name].tecton.ai
tecton workspace create --live [my-live-workspace]
tecton apply
⚡️ Retrieve real-time features
Now that our On-Demand Feature View is productionized, we can use it to compute features in real-time!
This step requires generating and setting a Service Account and giving it permissions to read from this workspace.
✅ Head to the following URL to create a new service account (replace "explore" with your organization's account name in the URL as necessary). Be sure to save the API key!
✅ Next, you should give the service account access to read features from your newly created workspace by following these steps:
- Navigate to the Service Account page by clicking on your new service account in the list at the URL above
- Click on "Assign Workspace Access"
- Select your workspace and give the service account the "Consumer" role
✅ Copy the generated API key into the code snippet below where it says
your-api-key
. Also be sure to replace the workspace name with your newly
created workspace name.
In the code below, we will retrieve a feature vector from our Feature Service, while passing in the necessary request data (the current transaction amount).
Tecton will use our python transformation to compute features in real-time using that request data, as well as the historical transaction average, retrieved from the online store.
Be sure to replace your-api-key
with the key you generated above.
# Use your API key generated in the step above
TECTON_API_KEY = "your-api-key" # replace with your API key
WORKSPACE_NAME = "[my-live-workspace]" # replace with your new workspace name if needed
tecton.set_credentials(tecton_api_key=TECTON_API_KEY)
ws = tecton.get_workspace(WORKSPACE_NAME)
fraud_detection_feature_service = ws.get_feature_service("fraud_detection_feature_service")
join_keys = {"user_id": "user_7661963940"}
request_data = {"amount": 72.06}
features = fraud_detection_feature_service.get_online_features(join_keys=join_keys, request_data=request_data)
pprint(features.to_dict())
Out:
{'transaction_amount_is_higher_than_average.transaction_amount_is_higher_than_average': False, 'user_transaction_averages.yearly_average': 158.71344729344736}
The .get_online_features()
method makes it easy to push events from a
notebook. For best performance in production, we recommend reading directly from
the REST API or using our
Python Client Library
⭐️ Conclusion
Nice work! Now you've successfully productionized a true real-time feature that could only be computed at request time all using simple Python.
But that's just the start of what Tecton can do. Check out Feature Design Patterns to see all the types of features you can build using Batch, Stream, and On-Demand Feature Views.