Version: 0.9

⏱️ Building On-Demand Features

Many critical features for real-time models can only be calculated at the time of a request, either because:

They require data that is only available at request time (e.g. a user's current location)
They can't efficiently be pre-computed (e.g. computing the embedding similarity between all possible users)

Running transformations at request time can also be useful for:

Post-processing feature data (example: imputing null values)
Running additional transformations after Tecton-managed aggregations
Defining new features without needing to rematerialize Feature Store data

For more details, see On-Demand Feature Views.

This is where "On-Demand" features come in. In Tecton, an On-Demand Feature View let's you calculate features at the time of a request, using either data passed in with the request or pre-computed batch and stream features.

This tutorial will show how you can develop, test, and productionize on-demand features for real-time models. This tutorial is centered around a fraud detection use case, where we need to predict in real-time whether a transaction that a user is making is fraudulent.

note

This tutorial assumes some basic familiarity with Tecton. If you are new to Tecton, we recommend first checking out Building a Production AI Application with Tecton which walks through an end-to-end journey of building a real-time ML application with Tecton.

⚙️ Install Pre-Reqs

First things first, let's install the Tecton SDK and other libraries used by this tutorial (we recommend in a virtual environment) using:

!pip install 'tecton[rift]==0.9.0' gcsfs s3fs -q

After installing, run the following command to log in to your organization's Tecton account. Be sure to use your own account name.

Note: You need to press enter after pasting in your authentication code.

import tecton

tecton.login("explore.tecton.ai")  # replace with your URL

Let's then run some basic imports and setup that we will use later in the tutorial.

from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
from pprint import pprint
import pandas as pd

tecton.set_validation_mode("auto")
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")

👩‍💻 Create an on-demand feature that leverages request data

Let's say that for our fraud detection model, we want to be able to leverage information about the user's current transaction that we are evaluating. We only have access to that information at the time of evaluation so any features derived from current transaction information need to be computed in real-time.

On-Demand Feature Views are able to leverage real-time request data for building features. In this case, we will do a very simple check to see if the current transaction amount is over $1000. This is a pretty basic feature, but in the next section we will look at how to make it better!

To define an on-demand feature that leverages request data, we first define a Request Source. The Request Source specifies the expected schema for the data that will be passed in with the request.

info

When using mode='python' the inputs and outputs of the On-Demand Feature View are dictionaries.

For more information on modes in On Demand Feature Views see On-Demand Feature Views.

transaction_request = RequestSource(schema=[Field("amount", Float64)])


@on_demand_feature_view(
    sources=[transaction_request],
    mode="python",
    schema=[Field("transaction_amount_is_high", Bool)],
)
def transaction_amount_is_high(transaction_request):
    return {"transaction_amount_is_high": transaction_request["amount"] > 1000}

Now that we've defined our feature, we can test it out with some mock data using .run().

request = {"amount": 182.4}

transaction_amount_is_high.run(transaction_request=request)

Out:

{'transaction_amount_is_high': False}

🔗 Create an on-demand feature that leverages request data and other features

This feature is okay, but wouldn't it be much better if we could compare the transaction amount to the user's historical average?

On-Demand Feature Views also have the ability to depend on Batch and Stream Feature Views as input data sources. We can use this capability to improve our feature. Let's take a look.

First we will create a Batch Feature View that computes the user's 1-year average transaction amount. Then we will add this as a source in a new On-Demand Feature View with an updated feature transformation.

transactions_batch = BatchSource(
    name="transactions_batch",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/transactions.pq",
        file_format="parquet",
        timestamp_field="timestamp",
    ),
)

user = Entity(name="user", join_keys=["user_id"])


@batch_feature_view(
    sources=[transactions_batch],
    entities=[user],
    mode="pandas",
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(function="mean", column="amount", time_window=timedelta(days=365), name="yearly_average"),
    ],
    schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)],
)
def user_transaction_averages(transactions):
    return transactions[["user_id", "timestamp", "amount"]]


transaction_request = RequestSource(schema=[Field("amount", Float64)])


@on_demand_feature_view(
    sources=[transaction_request, user_transaction_averages],
    mode="python",
    schema=[Field("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_averages):
    amount_mean = user_transaction_averages["yearly_average"] or 0
    return {"transaction_amount_is_higher_than_average": transaction_request["amount"] > amount_mean}

We can again test our new feature using .run() and passing in example data.

averages = {"yearly_average": 33.46}
request = {"amount": 182.4}

transaction_amount_is_higher_than_average.run(user_transaction_averages=averages, transaction_request=request)

Out:

{'transaction_amount_is_higher_than_average': True}

Great! Now that this feature looks to be doing what we want, let's see how we can generate training data with it.

🧮 Generating Training Data with On-Demand Features

When generating training datasets for on-demand features, Tecton uses the exact same transformation logic as it does online to eliminate online/offline skew.

The Python function you defined will be executed as a UDF on the training data set.

To see this in action, we will first load up a set of historical training events.

info

Tecton expects that any request data passed in online is present in the set of historical training events. In our example below, this is represented by the amount column.

# Retrieve our dataset of historical transaction data
transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})

# Retrieve our dataset of labels containing transaction_id and is_fraud (set to 1 if the transaction is fraudulent or 0 otherwise)
training_labels = pd.read_parquet("s3://tecton.ai.public/tutorials/labels.pq", storage_options={"anon": True})

# Join our label dataset to our transaction data to produce a list of training events
training_events = training_labels.merge(transactions_df, on=["transaction_id"], how="left")[
    ["user_id", "timestamp", "amount", "is_fraud"]
]

display(training_events.head(5))

	user_id	timestamp	amount
0	user_5120258459	2021-01-01 00:12:17.950000	732.27
1	user_8873190199	2021-01-01 00:14:23.411000	56.14
2	user_4389585068	2021-01-01 00:16:39.189000	514.87
3	user_5117507286	2021-01-01 00:41:32.604000	43.85
4	user_2862609228	2021-01-01 00:45:22.095000	50.74

Now we can add our On-Demand Feature View to a Feature Service and generate training data for these historical events.

note

We included the dependent Batch Feature View in the Feature Service as well to visualize the data better, but it is not necessary to include.

from tecton import FeatureService


fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service",
    features=[user_transaction_averages, transaction_amount_is_higher_than_average],
)

training_data = fraud_detection_feature_service.get_historical_features(training_events).to_pandas().fillna(0)
display(training_data.head(5))

	user_id	timestamp	amount	is_fraud	transaction_amount_is_higher_than_average__transaction_amount_is_higher_than_average
0	user_1203218114	2023-05-03 15:01:55.826000	107.98	0	True
1	user_1739270457	2023-10-15 07:20:30.640000	21.44	0	True
2	user_1739270457	2024-04-23 14:44:46.515000	27.1	0	True
3	user_1739270457	2025-01-01 00:06:10.014000	731.6	1	True
4	user_1739270457	2025-01-01 00:04:13.014000	1.88	1	True

We can use this training data set to train an accurate model with our new feature.

🚀 Run on-demand features in production

Once we are happy with our On-Demand Feature View we can copy the definitions into our Feature Repository and apply our changes to a live workspace using the Tecton CLI.

tip

For more information on working with Feature Repositories or applying changes to workspaces, check out the Quick Start tutorial or Feature Development Workflow pages.

We've also included the Batch Feature View dependency and the Feature Service in the file below.

feature_repo.py

from tecton import *
from tecton.types import *
from datetime import datetime, timedelta

transactions_batch = BatchSource(
    name="transactions_batch",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/transactions.pq",
        file_format="parquet",
        timestamp_field="timestamp",
    ),
)

user = Entity(name="user", join_keys=["user_id"])


@batch_feature_view(
    sources=[transactions_batch],
    entities=[user],
    mode="pandas",
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(function="mean", column="amount", time_window=timedelta(days=365), name="yearly_average"),
    ],
    schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)],
    online=True,
    offline=True,
    feature_start_time=datetime(2023, 1, 1),
)
def user_transaction_averages(transactions):
    return transactions[["user_id", "timestamp", "amount"]]


transaction_request = RequestSource(schema=[Field("amount", Float64)])


@on_demand_feature_view(
    sources=[transaction_request, user_transaction_averages],
    mode="python",
    schema=[Field("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_averages):
    amount_mean = user_transaction_averages["yearly_average"] or 0
    return {"transaction_amount_is_higher_than_average": transaction_request["amount"] > amount_mean}


fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service",
    features=[user_transaction_averages, transaction_amount_is_higher_than_average],
)

✅ Run the following commands in your terminal to create a live workspace and apply your changes:

tecton login [your-org-account-name].tecton.ai
tecton workspace create --live [my-live-workspace]
tecton apply

⚡️ Retrieve real-time features

Now that our On-Demand Feature View is productionized, we can use it to compute features in real-time!

important

This step requires generating and setting a Service Account and giving it permissions to read from this workspace.

✅ Head to the following URL to create a new service account (replace "explore" with your organization's account name in the URL as necessary). Be sure to save the API key!

https://explore.tecton.ai/app/settings/accounts-and-access/service-accounts?create-service-account=true

✅ Next, you should give the service account access to read features from your newly created workspace by following these steps:

Navigate to the Service Account page by clicking on your new service account in the list at the URL above
Click on "Assign Workspace Access"
Select your workspace and give the service account the "Consumer" role

✅ Copy the generated API key into the code snippet below where it says your-api-key. Also be sure to replace the workspace name with your newly created workspace name.

In the code below, we will retrieve a feature vector from our Feature Service, while passing in the necessary request data (the current transaction amount).

Tecton will use our python transformation to compute features in real-time using that request data, as well as the historical transaction average, retrieved from the online store.

Be sure to replace your-api-key with the key you generated above.

# Use your API key generated in the step above
TECTON_API_KEY = "your-api-key"  # replace with your API key
WORKSPACE_NAME = "[my-live-workspace]"  # replace with your new workspace name if needed

tecton.set_credentials(tecton_api_key=TECTON_API_KEY)

ws = tecton.get_workspace(WORKSPACE_NAME)
fraud_detection_feature_service = ws.get_feature_service("fraud_detection_feature_service")

join_keys = {"user_id": "user_7661963940"}
request_data = {"amount": 72.06}

features = fraud_detection_feature_service.get_online_features(join_keys=join_keys, request_data=request_data)

pprint(features.to_dict())

Out:

{'transaction_amount_is_higher_than_average.transaction_amount_is_higher_than_average': False, 'user_transaction_averages.yearly_average': 158.71344729344736}

tip

The .get_online_features() method makes it easy to push events from a notebook. For best performance in production, we recommend reading directly from the REST API or using our Python Client Library

⭐️ Conclusion

Nice work! Now you've successfully productionized a true real-time feature that could only be computed at request time all using simple Python.

But that's just the start of what Tecton can do. Check out Feature Design Patterns to see all the types of features you can build using Batch, Stream, and On-Demand Feature Views.

⏱️ Building On-Demand Features

⚙️ Install Pre-Reqs

👩‍💻 Create an on-demand feature that leverages request data​

🔗 Create an on-demand feature that leverages request data and other features​

🧮 Generating Training Data with On-Demand Features​

🚀 Run on-demand features in production​

⚡️ Retrieve real-time features​

⭐️ Conclusion​

Was this page helpful?

👩‍💻 Create an on-demand feature that leverages request data

🔗 Create an on-demand feature that leverages request data and other features

🧮 Generating Training Data with On-Demand Features

🚀 Run on-demand features in production

⚡️ Retrieve real-time features

⭐️ Conclusion