Realtime Feature View
A Realtime Feature View is used to run row-level, request-time transformations on data from Request Sources, Batch Feature Views, or Stream Feature Views. Unlike Batch and Stream Feature Views, Realtime Feature Views do not precompute and materialize data to the Feature Store, but instead run transformations both online and offline at the time of the request.
NOTE: A Realtime Feature View that uses Feature View as inputs will always implicitly use all of the features of that feature view, even if the Feature View is using subsetted inputs.
Running transformations request time can be useful for:
- Calculating features based on data that is only available at the time of the request such as a current transaction or user location
- Defining feature crosses that would be inefficient to precompute (example: compare embeddings between two users)
- Running additional transformations on Tecton-managed aggregations
- Defining new features without needing to rematerialize Feature Store data
- Post-processing feature data (example: imputing null values)
Common Examples
- Turning a user's GPS coordinates into a geohash
- Parsing a user's search string
- Checking if a user's incoming transaction is larger than the user's average number of transactions in the last 30 days
- Picking the maximum transaction of the past 10 transactions of a user (if
combined with a
last-n
aggregation in a Stream Feature View`) - Computing the cosine similarity between a pre-computed user embedding and a query embedding
Realtime Feature View transformations introduce request-time latency based on
the transformation being executed. For example, if your realtime transformation
executes a sleep("1")
statement, the execution of this transformation won't be
any faster than 1 second.
Realtime Feature Transformations​
Realtime Feature View transformations are written using Python.
When using mode='python'
, Tecton passes in a row of data for each source in
the form of a dictionary. Realtime feature outputs are returned in a single
dictionary of one or more feature values.
When using mode='pandas'
, Tecton passes in one or many rows of data in the
form of a pandas DataFrame. At offline execution time, Tecton will pass in a
batch of several rows. At online inference time, Tecton will typically pass in a
single row. Tecton expects the function to return a pandas DataFrame.
Environments and Server Groups​
Realtime Feature Views can use third-party libraries in their transformations. This can be accomplished by defining environments, which are immutable sandboxed Python environments.
Environment definitions can be associated with server groups to provide the necessary resources for the transformation to execute. Transform Server Groups are a group of Transform Server nodes that are capable of autoscaling, and are associated with a particular workspace. Different server groups can be also be used to isolate traffic to read feature values.
Example​
- Python
- Pandas
from tecton import realtime_feature_view, RequestSource, Attribute
from tecton.types import Float64, Bool, Field
from features.user_transaction_amount_averages import user_transaction_amount_averages
transaction_request = RequestSource(schema=[Field("amount", Float64)])
@realtime_feature_view(
sources=[transaction_request, user_transaction_amount_averages],
mode="python",
features=[Attribute("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_amount_averages):
amount_mean = user_transaction_amount_averages["amount_mean_24h_10m"] or 0
return {"transaction_amount_is_higher_than_average": transaction_request["amount"] > amount_mean}
from tecton import realtime_feature_view, RequestSource, Attribute
from tecton.types import Float64, Bool, Field
from features.user_transaction_amount_averages import user_transaction_amount_averages
transaction_request = RequestSource(schema=[Field("amount", Float64)])
@realtime_feature_view(
sources=[transaction_request, user_transaction_amount_averages],
mode="pandas",
features=[Attribute("transaction_amount_is_higher_than_average", Bool)],
)
def transaction_amount_is_higher_than_average(transaction_request, user_transaction_amount_averages):
user_transaction_amount_averages["amount"] = transaction_request["amount"]
user_transaction_amount_averages["transaction_amount_is_higher_than_average"] = (
user_transaction_amount_averages["amount"] > user_transaction_amount_averages["amount_mean_24h_10m"]
)
return user_transaction_amount_averages[["transaction_amount_is_higher_than_average"]]
How to choose between pandas and python mode​
mode='python'
is significantly more performant than mode='pandas'
during
online inference, but slightly less performant when offline data is generated
for training or offline prediction purposes.
Generally, for any online inference use case, use mode='python'
. Only consider
using mode='pandas'
if you use an RTFV only to generate training data, or
offline inference data.
Parameters​
See the API reference for the full list of parameters.