RealtimeContext
for Realtime Feature Views
Introduction​
Realtime Feature Views (RTFVs) play a crucial role in providing real-time
features based on incoming data. When dealing with real-time data, it's often
necessary to access context metadata, such as the timestamp of the request, to
compute features accurately. The RealtimeContext
class is designed to
facilitate this by passing context metadata to your RTFVs. This document will
guide you through understanding what RealtimeContext
is, how to use it in both
Python and Pandas modes, and how it operates during Offline
Retrieval and testing using run_transformation
.
What is RealtimeContext
?​
RealtimeContext
is a class used to pass context metadata, such as the
request_timestamp
, to the context
parameter of a Realtime Feature View. This
class provides essential information that can be leveraged within your feature
transformations to ensure that features are computed with the correct temporal
context.
@realtime_feature_view(...)
def my_realtime_feature_view(request_data, context):
# Use the Realtime Context to access request-time metadata
request_timestamp = context.request_timestamp
Attributes​
request_timestamp
: A singledatetime
object representing the timestamp of the request made to the feature server. Available in Python mode.request_timestamp_series
: Apandas.Series
object where each element corresponds to therequest_timestamp
for each row in the input data. Available in Pandas mode.
Modes of Operation: Python vs. Pandas​
Before diving into how to use RealtimeContext
, it's important to understand
the two modes in which Realtime Feature Views can operate:
- Python Mode: Transformation functions are written using standard Python code. Suitable for simpler transformations that process one record at a time.
- Pandas Mode: Transformation functions use Pandas DataFrames and Series, allowing vectorized operations over multiple records. Ideal for batch processing and more complex data manipulations.
The RealtimeContext
class provides different attributes depending on the mode:
- In Python Mode: Use
context.request_timestamp
. - In Pandas Mode: Use
context.request_timestamp_series
.
Using RealtimeContext
in Python Mode​
In Python mode, RealtimeContext
provides the request_timestamp
attribute,
which you can use directly within your transformation functions.
For an online query, context.request_timestamp
will contain the request
timestamp of the online query. For an offline retrieval query,
context.request_timestamp
will be appropriately populated with the event
timestamp for each row in the events
dataframe argument passed to
get_features_for_events
.
Example: Calculating Days Since an input User Timestamp​
from datetime import timezone
user_timestamp_source = RequestSource([Field("user_timestamp", Timestamp)])
@realtime_feature_view(
sources=[user_timestamp_source],
mode="python",
features=[
Attribute("name", String),
Attribute("days_since", Int64),
],
)
def days_since_timestamp(request, context):
days_since = (context.request_timestamp - request["user_timestamp"]).days
return {
"days_since": days_since,
}
Using RealtimeContext
in Pandas Mode​
In Pandas mode, RealtimeContext
provides the request_timestamp_series
attribute, which is a Pandas Series containing the request timestamp for each
row.
During an online query, this series will contain a single value with the request
timestamp of the online query. For an offline retrieval query, this series will
contain each timestamp in the events
dataframe argument passed to
get_features_for_events
.
Example: Calculating Days Since a User Timestamp in Pandas Mode​
user_timestamp_source = RequestSource([Field("user_timestamp", Timestamp)])
@realtime_feature_view(
sources=[user_timestamp_source],
mode="pandas",
features=[
Attribute("days_since", Int64),
],
)
def days_since_timestamp_pandas(request, context):
request_timestamps = context.request_timestamp_series
request["days_since"] = (request_timestamps - request["user_timestamp"]).dt.days
return request[["days_since"]]
Using RealtimeContext
with a Dependent Feature View​
Example: Calculating Time Since a User's Last Transaction​
A Realtime Feature View can combine data from multiple sources, including the outputs of other feature views. This can be particularly useful when you want to use the result of one Feature View as input for another.
In the below example, user_transaction_fv
is a Batch Feature View that we can
use to get the latest transaction event for a user.
@realtime_feature_view(
sources=[RequestSource(schema=[Field("user_id", String)]), user_transactions_fv],
mode="python",
features=[Attribute("user_id", String), Attribute("days_since_transaction", Int64)],
)
def days_since_last_transaction(source, latest_transaction, context):
latest_timestamp = latest_transaction["transaction_timestamp"]
return {
"user_id": source["user_id"],
"days_since_transaction": (context.request_timestamp - latest_timestamp).days,
}
Offline Retrieval with Event Timestamps​
In offline retrieval, you compute features for historical data, where each row
has its own event timestamp. These timestamps are injected as the
request_timestamp
for each row in the context.
Example:​
events_data = {
"name": ["Alice", "Bob", "Charlie"],
"user_timestamp": [
datetime(2009, 5, 21, 10, 0, 0),
datetime(2003, 5, 21, 10, 5, 0),
datetime(2001, 5, 21, 10, 10, 0),
],
"timestamp": [
datetime(2009, 5, 22, 10, 0, 0),
datetime(2003, 5, 23, 10, 5, 0),
datetime(2001, 5, 24, 10, 10, 0),
],
}
events_df = pd.DataFrame(events_data)
results = days_since_timestamp.get_features_for_events(events_df, timestamp_key="timestamp").to_pandas()
- Event Timestamps as Context: In offline retrieval, each event timestamp
from the DataFrame (e.g.,
"timestamp": [datetime(2009, 5, 22, 10, 0, 0), ...]
) is injected into theRealtimeContext
as therequest_timestamp
for that particular event. - Per-Row Context: Each row in
events_df
gets its ownRealtimeContext
with the corresponding event timestamp as therequest_timestamp
. - Feature Calculation: The feature view is then run on each row, and the
request_timestamp
used in the feature calculation reflects the timestamp of the event.
Testing Using run_transformation
with a Custom RealtimeContext
​
You can use the run_transformation
method and pass a mock RealtimeContext
to
simulate different scenarios and test a RealtimeFeatureView.
Example:​
from datetime import datetime
# Create sample input data
request = {"name": "Alice", "user_timestamp": datetime(2023, 9, 1, tzinfo=timezone.utc)}
# Create a mock RealtimeContext
mock_context = RealtimeContext(request_timestamp=datetime(2023, 10, 1, tzinfo=timezone.utc))
# Run the transformation
result = days_since_timestamp.run_transformation(input_data={"request": request, "context": mock_context})
print(result)
Overriding the Context Parameter Name​
By default, the context is passed as the context
argument, but you can
override the context parameter name using the context_parameter_name
setting
in the feature view definition.
Example: Customizing Context Parameter Name​
@realtime_feature_view(
sources=[user_timestamp_source],
mode="pandas",
features=[
Attribute("days_since", Int64),
],
context_parameter_name="my_context",
)
def days_since_timestamp_pandas(request, my_context):
request_timestamps = context.request_timestamp_series
request["days_since"] = (request_timestamps - request["user_timestamp"]).dt.days
return request[["days_since"]]