On-Demand Feature View
An On-Demand Feature View is used to run row-level, request-time transformations on data from Request Sources, Batch Feature Views, or Stream Feature Views. Unlike Batch and Stream Feature Views, On-Demand Feature Views do not precompute and materialize data to the Feature Store, but instead run transformations both online and offline at the time of the request.
NOTE: An On-demand Feature View that uses Feature View as inputs will always implicitly use all of the features of that feature view, even if the Feature View is using subsetted inputs.
Running transformations request time can be useful for:
- Calculating features based on data that is only available at the time of the request such as a current transaction or user location
- Defining feature crosses that would be inefficient to precompute (example: compare embeddings between two users)
- Running additional transformations on Tecton-managed aggregations
- Defining new features without needing to rematerialize Feature Store data
- Post-processing feature data (example: imputing null values)
Common Examples
- Turning a user's GPS coordinates into a geohash
- Parsing a user's search string
- Checking if a user's incoming transaction is larger than the user's average number of transactions in the last 30 days
- Picking the maximum transaction of the past 10 transactions of a user (if
combined with a
last-n
aggregation in a Stream Feature View`) - Computing the cosine similarity between a pre-computed user embedding and a query embedding
On-Demand Feature View transformations introduce request-time latency based on
the transformation being executed. For example, if your on-demand transformation
executes a sleep("1")
statement, the execution of this transformation won't be
any faster than 1 second.
On-Demand Feature Transformations​
On-Demand Feature View transformations are written using Python.
When using mode='python'
, Tecton passes in a row of data for each source in
the form of a dictionary. On-demand feature outputs are returned in a single
dictionary of one or more feature values.
When using mode='pandas'
, Tecton passes in one or many rows of data in the
form of a pandas DataFrame. At offline execution time, Tecton will pass in a
batch of several rows. At online inference time, Tecton will typically pass in a
single row. Tecton expects the function to return a pandas DataFrame.
Example​
- Python
- Pandas
from tecton import on_demand_feature_view, RequestSource
from tecton.types import Float64, Bool, Field
from features.user_transaction_amount_averages import user_transaction_amount_averages
transaction_request = RequestSource(schema=[Field("amount", Float64)])
@on_demand_feature_view(
sources=[transaction_request, user_transaction_amount_averages],
mode="python",
schema=[Field("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_amount_averages):
amount_mean = user_transaction_amount_averages["amount_mean_24h_10m"] or 0
return {"transaction_amount_is_higher_than_average": transaction_request["amount"] > amount_mean}
from tecton import on_demand_feature_view, RequestSource
from tecton.types import Float64, Bool, Field
from features.user_transaction_amount_averages import user_transaction_amount_averages
transaction_request = RequestSource(schema=[Field("amount", Float64)])
@on_demand_feature_view(
sources=[transaction_request, user_transaction_amount_averages],
mode="pandas",
schema=[Field("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_amount_averages):
user_transaction_amount_averages["amount"] = transaction_request["amount"]
user_transaction_amount_averages["transaction_amount_is_higher_than_average"] = (
user_transaction_amount_averages["amount"] > user_transaction_amount_averages["amount_mean_24h_10m"]
)
return user_transaction_amount_averages[["transaction_amount_is_higher_than_average"]]
How to choose between pandas and python mode​
mode='python'
is significantly more performant than mode='pandas'
during
online inference, but slightly less performant when offline data is generated
for training or offline prediction purposes.
Generally, for any online inference use case, use mode='python'
. Only consider
using mode='pandas'
if you use an ODFV only to generate training data, or
offline inference data.
Parameters​
See the API reference for the full list of parameters.