Realtime Feature View
A Realtime Feature View (RTFV) runs row-level, request-time operations on data from Request Sources, Batch Feature Views, or Stream Feature Views. Unlike Batch and Stream Feature Views, Realtime Feature Views do not precompute and materialize data to the Feature Store but instead generate features at request time.
Use Casesโ
Realtime Feature Views are useful for:
- Generating features from request-time data (e.g. current transaction, user location)
- Creating features based on one or more upstream Materialized Feature Views
- Post-processing feature data (e.g. null imputation)
Common Examplesโ
- Converting GPS coordinates to geohash
- Parsing search strings
- Comparing transactions against user averages
- Calculating Z-Score or other statistical metrics
- Computing embedding similarities
This example shows a simple Realtime Feature View that flags transactions over $1000.
transaction_amount_is_high = RealtimeFeatureView(
name="transaction_amount_is_high",
sources=[transaction_request],
features=[
Calculation(
name="transaction_amount_is_high",
expr="transaction_request.amount > 1000",
)
],
)
Overviewโ
To define a Realtime Feature View, you'll need to be familiar with the following concepts:
Input Data Sourcesโ
Realtime Feature Views operate on one or more input data sources specified in
the sources parameter. They can be Request Sources or other Feature Views. You
need to determine what data sources are required to define your desired
features.
Feature Definitionโ
Realtime Feature Views support three approaches to defining features. You can
define a Realtime Feature view using Calculation features, or using
transformation functions in python or pandas mode. You select an approach
based on performance requirements and feature definition complexity.
Testing and Deploymentโ
You should test your feature views to ensure your feature definitions and configuration are correct. After you've defined your Realtime Feature View, you can write tests for it following the testing guide.
Once your features are defined and tested, you can deploy the feature using
tecton apply.
Implementation Guideโ
Configuring Input Data Sourcesโ
Determine what data sources are required to define your desired features, and
include them in the sources parameter of your Realtime Feature View.
Request Sourcesโ
Request Sources represent data available at request time e.g. transaction amount.
from tecton import RequestSource
from tecton.types import Field, Float64
transaction_request = RequestSource(name="transaction_request", schema=[Field("amount", Float64)])
Materialized Feature Viewsโ
Realtime Feature Views can depend on Batch Feature Views or Stream Feature Views to combine real-time data with historical context.
# Example: Combining request data with historical user metrics feature view
sources = [transaction_request, user_historical_metrics_fv]
Defining Featuresโ
You can define a Realtime Feature view using Calculation features, or using
transformation functions in python or pandas mode.
Read this section to understand each approach. Read the Best Practices section to understand their relative trade-offs.
Using Calculation Featuresโ
Calculations are used to define row-level SQL-like expressions, which will be efficiently executed directly in the Feature Server without the overhead of a Python or Pandas transformation. They're the best option when your use case can be expressed in Calculation supported functions.
Create a Realtime Feature View that uses Calculations by directly instantiating
the RealtimeFeatureView class and providing Calculation feature objects to
the features argument.
In the example below, the Realtime Feature View "transaction_analysis_rtfv"
has one Calculation feature "transaction_z_score". The Calculation's
expression uses the amount, mean, and stddev fields from the
transaction_metrics feature view to produce the z-score feature value.
transaction_analysis_rtfv = RealtimeFeatureView(
name="transaction_analysis_rtfv",
sources=[transaction_metrics],
features=[
Calculation(
name="transaction_z_score",
expr="(COALESCE(transaction_metrics.amount, 0) - COALESCE(transaction_metrics.mean, 0)) / COALESCE(transaction_metrics.stddev, 1)",
),
],
)
Feature Views using Calculation Features can not use a python or pandas
mode transformation function.
Using Transformation Functions with Python Modeโ
For more complex feature operations, Tecton supports python mode in Realtime
Feature Views which allows you to define Python transformations to generate
features.
To create a Realtime Feature View that uses python mode, apply the
@realtime_feature_view decorator to a Python transformation function.
The function must define an input argument corresponding to each item in the
sources parameter. Each argument will receive a dictionary whose entries will
be the fields of the RequestSource or Materialized Feature View. The function
must return a dictionary of feature names and values.
Attribute features are used to specify the features from the output dictionary
of the python transformation function.
In the example below, the @realtime_feature_view decorator is used to define a
python mode feature view from the calculate_risk function. The function
takes input from both the transaction_request and user_metrics sources, and
computes a risk_score feature by multiplying the transaction amount by the
user's fraud score.
@realtime_feature_view(
sources=[transaction_request, user_metrics],
mode="python",
features=[Attribute("risk_score", Float64)],
)
def calculate_risk(transaction_request, user_metrics):
return {"risk_score": transaction_request["amount"] * user_metrics["fraud_score"]}
Using Transformation Functions with pandas Modeโ
Similar to python mode, pandas mode allows you to define powerful
DataFrame-based transformation functions optimized for offline retrieval.
To create a Realtime Feature View that uses pandas mode, you apply the
@realtime_feature_view decorator to a transformation function.
The function must define an input argument corresponding to each item in the
sources parameter. Each argument will receive a pandas DataFrame whose columns
will be the fields of the RequestSource or Materailized Feature View. The
function must return a pandas DataFrame of feature columns.
Attribute features are used to specify the output of the pandas transformation
function.
In the example below, the @realtime_feature_view decorator is used to define a
pandas mode feature view from the calculate_risk function. The function
takes input from both the transaction_request and user_metrics sources, and
computes a risk_score feature by multiplying the transaction amount column and
the user's fraud score column.
@realtime_feature_view(
sources=[transaction_request, user_metrics],
mode="pandas",
features=[Attribute("risk_score", Float64)],
)
def calculate_risk(transaction_request, user_metrics):
# Calculate risk score using pandas operations
result = pd.DataFrame()
result["risk_score"] = transaction_request["amount"] * user_metrics["fraud_score"]
return result
Additional Feature Definition Notesโ
RealtimeContextโ
RealtimeContext is an object that provides access to request-time metadata,
such as the request_timestamp, within your Realtime Feature Views. Incorporate
RealtimeContext into your feature views to define novel time-based features.
Read more about
using RealtimeContext in Realtime Feature Views.
Feature Naming Requirementsโ
Follow Feature Naming Requirements for naming rules and examples.
Python Version Supportโ
Realtime Feature View transformation functions support Python versions 3.8 and 3.9.
Additional Examplesโ
For more examples of Realtime Feature Views, see the Examples page and the Tecton sample repository.
Best Practicesโ
Use Calculation Features when:
- Your use case can be accomplished using the set of supported SQL functions.
- When performance is critical. Since you avoid the overhead of a Python or Pandas transformation, Calculation Features will be more efficient than Python and Pandas mode transformations for most use cases.
Use python or pandas Mode Transformations when you need:
- Full expressiveness and flexibility of Python
- Complex algorithms / logic
- External libraries
- External API calls
Calculation Features are recommended for simple transformations and are most performant across online and offline retrieval.
If you need Python, python mode is recommended for more efficient online
serving. pandas mode is recommended if you would like to optimize for more
efficient offline retrieval.
Testingโ
Test your Realtime Feature Views to ensure they work correctly before deploying to production. See Testing Realtime Features for detailed testing strategies and examples.
What's Nextโ
Now that you can build and use Realtime Feature Views, try the following:
- Combine with Batch or Stream Feature Views for hybrid pipelines.
- Use this feature in generating training data.
- Explore Testing Realtime Features.
- Explore Feature Services to see how Feature Views are bundled and served.