Skip to main content
Version: 0.8

tecton.OnDemandFeatureView

Summary

A Tecton On-Demand Feature View.

The OnDemandFeatureView should not be instantiated directly and the tecton.on_demand_feature_view() decorator is recommended instead.

Attributes

NameData TypeDescription
created_atOptional[datetime.datetime]The time that this Tecton object was created or last updated.
defined_inOptional[str]The repo filename where this object was declared.
descriptionstrReturns the description of the Tecton object.
idstrReturns the unique id of the Tecton object.
infoA dataclass containing basic info about this Tecton object.
join_keysList[str]The join key column names.
namestrReturns the name of the Tecton object.
online_serving_indexList[str]The set of join keys that will be indexed and queryable during online serving.
ownerOptional[str]Returns the owner of the Tecton object.
tagsDict[str, str]Returns the tags of the Tecton object.
transformationsList[specs.TransformationSpec]The Transformations for this Feature View.
urlstrReturns a link to the Tecton Web UI.
wildcard_join_keyOptional[set]Returns a wildcard join key column name if it exists; Otherwise returns None.
workspaceOptional[str]Returns the workspace that this Tecton object belongs to.

Methods

NameDescription
get_feature_columns()The features produced by this FeatureView.
get_features_for_events(...)Returns a TectonDataFrame of historical values for this feature view.
get_historical_features(...)Returns a TectonDataFrame of historical values for this feature view.
get_online_features(...)Returns a single Tecton tecton.FeatureVector from the Online Store.
run(...)Run the OnDemandFeatureView using mock inputs.
run_transformation(...)Run the OnDemandFeatureView using mock inputs.
summary()Displays a human readable summary.
test_run(...)Run the OnDemandFeatureView using mock sources.
validate()Validate this Tecton object and its dependencies (if any).
with_join_key_map(...)Rebind join keys for a Feature View used in a Feature Service.
with_name(...)Rename a Feature View used in a Feature Service.

cancel_materialization_job(...)

Cancels the scheduled or running batch materialization job for this Feature View specified by the job identifier. Once cancelled, a job will not be retried further.

Job run state will be set to MANUAL_CANCELLATION_REQUESTED. Note that cancellation is asynchronous, so it may take some time for the cancellation to complete. If job run is already in MANUAL_CANCELLATION_REQUESTED or in a terminal state then it’ll return the job.

Parameters

  • job_id (str) – ID string of the materialization job.

Returns

MaterializationJobData object for the cancelled job.

get_feature_columns()

The features produced by this FeatureView.

get_features_for_events(...)

Returns a TectonDataFrame of historical values for this feature view.

By default (i.e. from_source=None), this method fetches feature values from the Offline Store for Feature Views that have offline materialization enabled and otherwise computes feature values on the fly from raw data.

If no arguments are passed in, all feature values for this feature view will be returned in a Tecton DataFrame.

info

This method is functionally equivalent to get_historical_features(spine) and has been renamed in Tecton 0.8 for clarity. get_historical_features() is planned to be deprecated in a future release.

Parameters

  • events (Union[pyspark.sql.DataFrame,pandas.DataFrame, TectonDataFrame]) – A DataFrame of possible join keys, request data keys, and timestamps that specify which feature values to fetch. To distinguish between columns in the events DataFrame and feature columns, feature columns are labeled as feature_view_name__feature_name in the returned DataFrame.

  • timestamp_key (str) – Name of the time column in the events DataFrame. This method will fetch the latest features computed before the specified timestamps in this column. Not applicable if the Feature Service strictly contains OnDemandFeatureViews with no feature view dependencies. (Default: None)

  • from_source (bool) – Whether feature values should be recomputed from the original data source. If None, feature values will be fetched from the Offline Store for Feature Views that have offline materialization enabled and otherwise computes feature values on the fly from raw data. Use from_source=True to force computing from raw data and from_source=False to error if any Feature Views are not materialized. (Default: None)

  • save (bool) – Whether to persist the DataFrame as a Dataset object. This parameter is not supported in Tecton on Snowflake. (Default: False)

  • save_as (str) – Name to save the DataFrame as. If unspecified and save=True, a name will be generated. This parameter is not supported in Tecton on Snowflake. (Default: None)

  • mock_inputs (Optional[Dict[str, Union[pandas.DataFrame, pyspark_dataframe.DataFrame]]]) – Dictionary for mock inputs that should be used instead of fetching directly from raw data sources. The keys should match the feature view’s function parameters. For feature views with multiple sources, mocking some data sources and using raw data for others is supported. Using mock_inputs is incompatible with from_source=False and save/save_as.

  • compute_mode (Union[str, tecton.ComputeMode, None]) – Compute mode to use to produce the data frame. Valid string values are "spark", "snowflake", "athena", and "rift".

Returns

A TectonDataFrame

Examples

An OnDemandFeatureView fv that expects request time data for the key amount:

The request time data is defined in the feature definition as such:

request_schema = StructType()
request_schema.add(StructField("amount", DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)
  1. fv.get_features_for_events(events) where events=pandas.Dataframe({'amount': [30, 50, 10000]}) Fetch historical features from the offline store with request time data inputs 30, 50, and 10000 for key ‘amount’.

  2. fv.get_features_for_events(events, save_as='my_dataset') where events=pandas.Dataframe({'amount': [30, 50, 10000]}) Fetch historical features from the offline store request time data inputs 30, 50, and 10000 for key ‘amount’. Save the DataFrame as dataset with the name ‘my_dataset’.

An OnDemandFeatureView fv that expects request time data for the key amount and has a feature view dependency with join key user_id:

  1. fv.get_features_for_events(events) where events=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'amount': [30, 50, 10000]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps and values for amount in the events dataframe.

get_historical_features(...)

Returns a TectonDataFrame of historical values for this feature view.

By default (i.e. from_source=None), this method fetches feature values from the Offline Store for input Feature Views that have offline materialization enabled and otherwise computes input feature values on the fly from raw data.

Parameters

  • spine (Union[pyspark.sql.DataFrame, pandas.DataFrame, TectonDataFrame]) – The spine to join against, as a dataframe. The returned data frame will contain rollups for all (join key, request data key) combinations that are required to compute a full frame from the spine.

  • timestamp_key (str) – Name of the time column in spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified and this feature view has feature view dependencies, timestamp_key will default to the time column of the spine if there is only one present. (Default: None)

  • from_source (bool) – Whether feature values should be recomputed from the original data source. If None, input feature values will be fetched from the Offline Store for Feature Views that have offline materialization enabled and otherwise computes feature values on the fly from raw data. Use from_source=True to force computing from raw data and from_source=False to error if any input Feature Views are not materialized. (Default: None)

  • save (bool) – Whether to persist the DataFrame as a Dataset object. (Default: False)

  • save_as (Optional[str]) – Name to save the DataFrame as. If unspecified and save=True, a name will be generated. (Default: None)

  • compute_mode (Union[str, tecton.ComputeMode, None]) – Compute mode to use to produce the data frame. Valid string values are "spark", "snowflake", "athena", and "rift".

Returns

A TectonDataFrame.

Examples

An OnDemandFeatureView fv that expects request time data for the key amount:

The request time data is defined in the feature definition as such:

request_schema = StructType()
request_schema.add(StructField("amount", DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)
  1. fv.get_historical_features(spine) where spine=pandas.Dataframe({'amount': [30, 50, 10000]}) Fetch historical features from the offline store with request time data inputs 30, 50, and 10000 for key ‘amount’.

  2. fv.get_historical_features(spine, save_as='my_dataset') where spine=pandas.Dataframe({'amount': [30, 50, 10000]}) Fetch historical features from the offline store request time data inputs 30, 50, and 10000 for key ‘amount’. Save the DataFrame as dataset with the name ‘my_dataset’.

An OnDemandFeatureView fv that expects request time data for the key amount and has a feature view dependency with join key user_id:

  1. fv.get_historical_features(spine) where spine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'amount': [30, 50, 10000]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps and values for amount in the spine.

get_materialization_job(...)

Retrieves data about the specified materialization job for this Feature View.

This data includes information about job attempts.

Parameters

  • job_id (str) – ID string of the materialization job.

Returns

MaterializationJobData object for the job.

get_online_features(...)

Returns a single Tecton tecton.FeatureVector from the Online Store.

At least one of join_keys or request_data is required.

Parameters

  • join_keys (Optional[Mapping[str, Union[int, int64, str, bytes]]]) – Join keys of the enclosed FeatureViews. (Default: None)

  • include_join_keys_in_response (bool) – Whether to include join keys as part of the response FeatureVector. (Default: False)

  • request_data (Optional[Mapping[str, Union[int, int64, str, bytes, float]]]) – Dictionary of request context values used for OnDemandFeatureViews. (Default: None)

Returns

A tecton.FeatureVector of the results.

Examples

An OnDemandFeatureView fv that expects request time data for the key amount.

The request time data is defined in the feature definition as such:

request_schema = StructType()
request_schema.add(StructField("amount", DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)
  1. fv.get_online_features(request_data={'amount': 50}) Fetch the latest features with input amount=50.

An OnDemandFeatureView fv that has a feature view dependency with join key user_id and expects request time data for the key amount.

  1. fv.get_online_features(join_keys={'user_id': 1}, request_data={'amount': 50}, include_join_keys_in_response=True)

Fetch the latest features from the online store for user 1 with input amount=50. In the returned FeatureVector, include the join key information (user_id=1).

list_materialization_jobs()

Retrieves the list of all materialization jobs for this Feature View.

Returns

List of MaterializationJobData objects.

run(...)

info

This method has been replaced by the .run_transformation() method and will be deprecated in a future release.

Run the OnDemandFeatureView using mock inputs.

Parameters

**mock_inputs – Required. Keyword args with the same expected keys as the OnDemandFeatureView’s inputs parameters. For the “python” mode, each input must be a Dictionary representing a single row. For the “pandas” mode, each input must be a DataFrame with all of them containing the same number of rows and matching row ordering.

Returns

A Dict object for the “python” mode and a tecton DataFrame of the results for the “pandas” mode.

Example

# Given a python on-demand feature view defined in your workspace:
@on_demand_feature_view(
sources=[transaction_request, user_transaction_amount_metrics],
mode="python",
schema=output_schema,
description="The transaction amount is higher than the 1 day average.",
)
def transaction_amount_is_higher_than_average(request, user_metrics):
return {"higher_than_average": request["amt"] > user_metrics["daily_average"]}
# Retrieve and run the feature view in a notebook using mock data:
import tecton

fv = tecton.get_workspace("prod").get_feature_view("transaction_amount_is_higher_than_average")

result = fv.run(request={"amt": 100}, user_metrics={"daily_average": 1000})

print(result)
# {'higher_than_average': False}

summary()

Displays a human readable summary of this data source.

test_run(...)

Run the OnDemandFeatureView using mock sources.

Unlike run(), test_run() is intended for unit testing. It will not make calls to your connected Tecton cluster to validate the OnDemandFeatureView.

Parameters

  • **mock_sources (Union[Dict[str, Any], DataFrame]) – Required. Keyword args with the same expected keys as the OnDemandFeatureView’s inputs parameters. For the “python” mode, each input must be a Dictionary representing a single row. For the “pandas” mode, each input must be a DataFrame with all of them containing the same number of rows and matching row ordering.

Returns

A Dict object for the “python” mode and a pandas.DataFrame object for the “pandas” mode”.

Example

@on_demand_feature_view(
sources=[transaction_request],
mode="python",
schema=output_schema,
)
def transaction_amount_is_high(transaction_request):
return {"transaction_amount_is_high": transaction_request["amount"] > 10000}


# Test using `run` API.
result = transaction_amount_is_high.test_run(transaction_request={"amount": 100})

run_transformation(...)

Runs the On Demand Feature View using mock inputs.

Parameters

  • input_data (Dict[str, Any]) – Dict with the same expected keys as the On Demand Feature View's inputs parameters. For mode='python', each value must be a Dictionary representing a single row. For mode='pandas, each value must be a DataFrame with all of them containing the same number of rows and matching row ordering.

Returns

If mode='python', returns a Dict object of the results. If mode='pandas', returns a TectonDataFrame of the results.

Example

# Given a Python On Demand Feature View defined in your workspace:
@on_demand_feature_view(
sources=[transaction_request, user_transaction_amount_metrics],
mode="python",
schema=output_schema,
description="The transaction amount is higher than the 1 day average.",
)
def transaction_amount_is_higher_than_average(request, user_metrics):
return {"higher_than_average": request["amt"] > user_metrics["daily_average"]}
# Retrieve and run the Feature View in a notebook using mock data:
import tecton

fv = tecton.get_workspace("prod").get_feature_view("transaction_amount_is_higher_than_average")

input_data = {"request": {"amt": 100}, "user_metrics": {"daily_average": 1000}}

result = fv.run_transformation(input_data=input_data)

print(result)
# {'higher_than_average': False}

validate()

Validate this Tecton object and its dependencies (if any).

Validation performs most of the same checks and operations as tecton plan.

  1. Check for invalid object configurations, e.g. setting conflicting fields.

  2. For Data Sources and Feature Views, test query code and derive schemas. e.g. test that a Data Source’s specified s3 path exists or that a Feature View’s SQL code executes and produces supported feature data types.

Objects already applied to Tecton do not need to be re-validated on retrieval (e.g. fv = tecton.get_workspace('prod').get_feature_view('my_fv')) since they have already been validated during tecton plan. Locally defined objects (e.g. my_ds = BatchSource(name="my_ds", ...)) may need to be validated before some of their methods can be called, e.g. my_feature_view.get_historical_features().

with_join_key_map(...)

Rebind join keys for a Feature View used in a Feature Service.

The keys in join_key_map should be the feature view join keys, and the values should be the feature service overrides.

Parameters

  • join_key_map

Example

from tecton import FeatureService

# The join key for this feature service will be "feature_service_user_id".
feature_service = FeatureService(
name="feature_service",
features=[
my_feature_view.with_join_key_map({"user_id": "feature_service_user_id"}),
],
)

# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",
# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",
# "sender_features", and "recipient_features".
transaction_fraud_service = FeatureService(
name="transaction_fraud_service",
features=[
# Select a subset of features from a feature view.
transaction_features[["amount"]],
# Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
# transaction sender and recipient, so include the feature view twice and bind it to two different feature
# service join keys.
user_features.with_name("sender_features").with_join_key_map({"user_id": "sender_id"}),
user_features.with_name("recipient_features").with_join_key_map({"user_id": "recipient_id"}),
],
)

with_name(...)

Rename a Feature View used in a Feature Service.

Parameters

  • namespace

Example

from tecton import FeatureService

# The feature view in this feature service will be named "new_named_feature_view" in training data dataframe
# columns and other metadata.
feature_service = FeatureService(
name="feature_service",
features=[my_feature_view.with_name("new_named_feature_view")],
)

# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",
# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",
# "sender_features", and "recipient_features".
transaction_fraud_service = FeatureService(
name="transaction_fraud_service",
features=[
# Select a subset of features from a feature view.
transaction_features[["amount"]],
# Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
# transaction sender and recipient, so include the feature view twice and bind it to two different feature
# service join keys.
user_features.with_name("sender_features").with_join_key_map({"user_id": "sender_id"}),
user_features.with_name("recipient_features").with_join_key_map({"user_id": "recipient_id"}),
],
)

Was this page helpful?