FAQ: get_historical_features vs. run
Feature Views expose
get_historical_features should be used to compute or retrieve pre-computed
offline feature data. This method will always produce accurate feature
values for a requested time range or spine.
selectively retrieve pre-computed features from the offline store or compute
them from raw event data depending on whether offline materialization is
enabled. This can be explicitly overridden using
get_historical_features can be used for the following workflows:
- Generating historical training data using
training_eventsis a dataframe including historical timestamps for specific entities. This produces feature values as of a particular time for each requested entity, which can be used for model training.
- Generating batch inference data using
inference_join_keysis a dataframe including entities and the current timestamp, which produces the most recent feature data for requested entities.
- Inspecting offline data for a time range using
run should only be used when interactively testing or debugging a
run quite literally runs a Feature View transformation.
is based on raw event data, but also provides the option to specify mocked data
Do not use
run to generate training data since it is not guaranteed to
produce accurate feature values.
test_run is nearly identical to
run, but is intended for use in unit testing
since it explicitly requires mocked data sources, a local spark session, and
does not make any network requests. Most of this document will focus on
but the concepts extend to
🔑 Key Concept:
get_historical_features one-to-many relationship with
Here’s another way of considering the differences between the two methods: in
order to materialize offline data for a Feature Views, the Feature View pipeline
is run on a scheduled interval (based on
aggregation_interval) in a materialization job. **
run mimics the query that
would be run for a single materialization job for some time range**. This is
run requires a
end_time, which should be aligned to 1
scheduled interval (the SDK will emit warnings if a specified time range does
not align with 1 scheduled interval).
Finally, using the results of multiple runs, training data produced
get_historical_features is based on one or more materialization job