On-Demand Feature View
OnDemandFeatureView is used for simple transformations that are executed in real-time at feature request time. They allow you to:
- Calculate features based on information only available at request time, such as the amount of the current transaction; and
- Calculate the combination of other feature values, such as the amount of the current transaction compared to the 7 day average transaction amount.
OnDemandFeatureView stands in contrast to all other feature views (
BatchFeatureView ), which precompute feature values and store them in the offline and/or online feature store.
- your use case requires real-time fresh features that need to process data that is only available right at the time of your real-time prediction
- the latency introduced by the complexity of your on-demand transformation is acceptable for your use case (example: If your on-demand transformation executes a
sleep("1second")statement, the execution of this transformation won't be any faster than 1 second)
- precomputing your feature values would be a waste of storage or compute resources, because you're not expecting to actually use all pre-computed feature values in production, or because precomputing all possible feature combinations would be intractable
- Turning a user's GPS coordinates into a geohash
- Parsing a user's search string
- Checking if a user's incoming transaction is larger than the user's average number of transactions in the last 30 days
- Picking the maximum transaction of the past 10 transactions of a user (if combined with a
- Computing the cosine similarity between a pre-computed user embedding and a query embedding.
OnDemandFeatureView transformation is expressed as Python code.
For more examples see Examples here.
Feature with no dependencies
Feature with pre-computed dependencies
See the API reference for the full list of parameters.
In your feature repository, the
RequestDataSource defines the schema your
OnDemandFeatureView will expect for request time data.
To configure a
RequestDataSource, you'll need to first create a Spark
StructType that defines the type for each input parameter.
OnDemandFeatureView requires a defined output schema, similar to the
RequestDataSource. Tecton uses the schema to display the FeatureView's expected output in the web UI.
Note: Outputs from an
OnDemandFeatureView must be non-null, even if the output schema declares
Transformations for an
OnDemandFeatureView work the same as other Feature Views, except they must be written in Python with
See how to use an On Demand Feature View in a notebook here.
How it works
While other features are pre-computed and saved in the online store, the
OnDemandFeatureView transformation is executed in the Tecton service when you request a feature vector online. Inputs to the pipeline can be a
RequestDataSource included in the request, or the output of other features. They cannot access data from your batch or stream data sources.
OnDemandFeatureView is run at request time, you can only use Python-native or
pandas based transformations. To guarantee online/offline consistency, Tecton will automatically package your transformation as a Spark UDF when you generate historical feature values offline.
mode=python requires the latest
0.3 beta release. See CLI setup guide for instructions on how to install new tecton version.
On Demand Feature Views deliver faster request-time latency when used with
The primary difference between
mode=pandas is that transformations with
mode=python have simple Python dictionary inputs and outputs, in place of Pandas dataframes. This new option avoids the overhead associated with dataframes.
Python Mode Example
This example uses Python mode, but is equivalent to the Pandas mode feature view shown above.
Unit Testing with Python mode
run method accepts a dictionary representing the inputs for a single row. This input diverges slightly from
mode=pandas which can accept multiple rows at a time.
This example shows how to iterate through multiple test cases at a time.