Skip to main content
Version: Beta 🚧

RealtimeContext for Realtime Feature Views

Introduction​

Realtime Feature Views (RTFVs) play a crucial role in providing real-time features based on incoming data. When dealing with real-time data, it's often necessary to access context metadata, such as the timestamp of the request, to compute features accurately. The RealtimeContext class is designed to facilitate this by passing context metadata to your RTFVs. This document will guide you through understanding what RealtimeContext is, how to use it in both Python and Pandas modes, and how it operates during Offline Retrieval and testing using run_transformation.


What is RealtimeContext?​

RealtimeContext is a class used to pass context metadata, such as the request_timestamp, to the context parameter of a Realtime Feature View. This class provides essential information that can be leveraged within your feature transformations to ensure that features are computed with the correct temporal context.

@realtime_feature_view(...)
def my_realtime_feature_view(request_data, context):
# Use the Realtime Context to access request-time metadata
request_timestamp = context.request_timestamp

Attributes​

  • request_timestamp: A single datetime object representing the timestamp of the request made to the feature server. Available in Python mode.
  • request_timestamp_series: A pandas.Series object where each element corresponds to the request_timestamp for each row in the input data. Available in Pandas mode.

Modes of Operation: Python vs. Pandas​

Before diving into how to use RealtimeContext, it's important to understand the two modes in which Realtime Feature Views can operate:

  • Python Mode: Transformation functions are written using standard Python code. Suitable for simpler transformations that process one record at a time.
  • Pandas Mode: Transformation functions use Pandas DataFrames and Series, allowing vectorized operations over multiple records. Ideal for batch processing and more complex data manipulations.

The RealtimeContext class provides different attributes depending on the mode:

  • In Python Mode: Use context.request_timestamp.
  • In Pandas Mode: Use context.request_timestamp_series.

Using RealtimeContext in Python Mode​

In Python mode, RealtimeContext provides the request_timestamp attribute, which you can use directly within your transformation functions.

For an online query, context.request_timestamp will contain the request timestamp of the online query. For an offline retrieval query, context.request_timestamp will be appropriately populated with the event timestamp for each row in the events dataframe argument passed to get_features_for_events.

Example: Calculating Days Since an input User Timestamp​

from datetime import timezone

user_timestamp_source = RequestSource([Field("user_timestamp", Timestamp)])


@realtime_feature_view(
sources=[user_timestamp_source],
mode="python",
features=[
Attribute("name", String),
Attribute("days_since", Int64),
],
)
def days_since_timestamp(request, context):
days_since = (context.request_timestamp - request["user_timestamp"]).days
return {
"days_since": days_since,
}

Using RealtimeContext in Pandas Mode​

In Pandas mode, RealtimeContext provides the request_timestamp_series attribute, which is a Pandas Series containing the request timestamp for each row.

During an online query, this series will contain a single value with the request timestamp of the online query. For an offline retrieval query, this series will contain each timestamp in the events dataframe argument passed to get_features_for_events.

Example: Calculating Days Since a User Timestamp in Pandas Mode​

user_timestamp_source = RequestSource([Field("user_timestamp", Timestamp)])


@realtime_feature_view(
sources=[user_timestamp_source],
mode="pandas",
features=[
Attribute("days_since", Int64),
],
)
def days_since_timestamp_pandas(request, context):
request_timestamps = context.request_timestamp_series
request["days_since"] = (request_timestamps - request["user_timestamp"]).dt.days
return request[["days_since"]]

Using RealtimeContext with a Dependent Feature View​

Example: Calculating Time Since a User's Last Transaction​

A Realtime Feature View can combine data from multiple sources, including the outputs of other feature views. This can be particularly useful when you want to use the result of one Feature View as input for another.

In the below example, user_transaction_fv is a Batch Feature View that we can use to get the latest transaction event for a user.

@realtime_feature_view(
sources=[RequestSource(schema=[Field("user_id", String)]), user_transactions_fv],
mode="python",
features=[Attribute("user_id", String), Attribute("days_since_transaction", Int64)],
)
def days_since_last_transaction(source, latest_transaction, context):
latest_timestamp = latest_transaction["transaction_timestamp"]
return {
"user_id": source["user_id"],
"days_since_transaction": (context.request_timestamp - latest_timestamp).days,
}

Offline Retrieval with Event Timestamps​

In offline retrieval, you compute features for historical data, where each row has its own event timestamp. These timestamps are injected as the request_timestamp for each row in the context.

Example:​

events_data = {
"name": ["Alice", "Bob", "Charlie"],
"user_timestamp": [
datetime(2009, 5, 21, 10, 0, 0),
datetime(2003, 5, 21, 10, 5, 0),
datetime(2001, 5, 21, 10, 10, 0),
],
"timestamp": [
datetime(2009, 5, 22, 10, 0, 0),
datetime(2003, 5, 23, 10, 5, 0),
datetime(2001, 5, 24, 10, 10, 0),
],
}

events_df = pd.DataFrame(events_data)

results = days_since_timestamp.get_features_for_events(events_df, timestamp_key="timestamp").to_pandas()
  • Event Timestamps as Context: In offline retrieval, each event timestamp from the DataFrame (e.g., "timestamp": [datetime(2009, 5, 22, 10, 0, 0), ...]) is injected into the RealtimeContext as the request_timestamp for that particular event.
  • Per-Row Context: Each row in events_df gets its own RealtimeContext with the corresponding event timestamp as the request_timestamp.
  • Feature Computation: The feature view is then run on each row, and the request_timestamp used in the feature computation reflects the timestamp of the event.

Testing Using run_transformation with a Custom RealtimeContext​

You can use the run_transformation method and pass a mock RealtimeContext to simulate different scenarios and test a RealtimeFeatureView.

Example:​

from datetime import datetime

# Create sample input data
request = {"name": "Alice", "user_timestamp": datetime(2023, 9, 1, tzinfo=timezone.utc)}

# Create a mock RealtimeContext
mock_context = RealtimeContext(request_timestamp=datetime(2023, 10, 1, tzinfo=timezone.utc))

# Run the transformation
result = days_since_timestamp.run_transformation(input_data={"request": request, "context": mock_context})

print(result)

Overriding the Context Parameter Name​

By default, the context is passed as the context argument, but you can override the context parameter name using the context_parameter_name setting in the feature view definition.

Example: Customizing Context Parameter Name​

@realtime_feature_view(
sources=[user_timestamp_source],
mode="pandas",
features=[
Attribute("days_since", Int64),
],
context_parameter_name="my_context",
)
def days_since_timestamp_pandas(request, my_context):
request_timestamps = context.request_timestamp_series
request["days_since"] = (request_timestamps - request["user_timestamp"]).dt.days
return request[["days_since"]]

Was this page helpful?