FeatureService
Summaryβ
A Tecton Feature Service.
In Tecton, a Feature Service exposes an API for accessing a set of FeatureViews.
Once deployed in production, each model has one associated Feature Service that serves the model its features. A Feature Service contains a list of the Feature Views associated with a model. It also includes user-provided metadata such as name, description, and owner that Tecton uses to organize feature data.
Exampleβ
from tecton import FeatureService, LoggingConfig
# Import your feature views declared in your feature repo directory
from feature_repo.features.feature_views import last_transaction_amount_sql, transaction_amount_is_high
...
# Declare Feature Service
fraud_detection_feature_service = FeatureService(
name='fraud_detection_feature_service',
description='A FeatureService providing features for a model that predicts if a transaction is fraudulent.',
features=[
last_transaction_amount_sql,
transaction_amount_is_high,
...
]
logging=LoggingConfig(
sample_rate=0.5,
log_effective_times=False,
)
)
Attributesβ
Name | Data Type | Description |
---|---|---|
created_at | Optional[str] | The time that this Tecton object was created or last updated. |
defined_in | Optional[str] | The repo filename where this object was declared. |
description | str | Returns the description of the Tecton object. |
features | List[framework_feature_view.FeatureReference] | Returns the list of feature references included in this feature service. |
feature_views | Set[framework_feature_view.FeatureView] | Returns the deduplicated set of feature views included in this feature service. Feature views may be included in feature service multiple times. |
id | str | Returns the unique id of the Tecton object. |
info | ||
name | str | Returns the name of the Tecton object. |
owner | Optional[str] | Returns the owner of the Tecton object. |
tags | Dict[str, str] | Returns the tags of the Tecton object. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. |
Methodsβ
Name | Description |
---|---|
__init__() | Instantiates a new FeatureService. |
get_feature_columns() | Returns the list of all feature columns included in this feature service. |
get_features_for_events(...) | Fetch a TectonDataFrame of feature values from this Feature Service. |
get_historical_features(...) | Fetch a TectonDataFrame of feature values from this FeatureService. |
get_online_features(...) | Returns a single Tecton FeatureVector from the Online Store. |
query_features(...) | [Advanced Feature] Queries the FeatureService with a partial set of join_keys defined in the online_serving_index of the included FeatureViews. |
summary() | Displays a human readable summary of this Feature View. |
validate() | Validate this Tecton object and its dependencies (if any). |
__init__(...)β
Instantiates a new FeatureService.
Parametersβ
-
name
(str
) β A unique name for the Feature Service. -
description
(Optional
[str
]) β A human-readable description. (Default:None
) -
tags
(Optional
[Dict
[str
,str
]]) β Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). (Default:None
) -
owner
(Optional
[str
]) β Owner name (typically the email of the primary maintainer). (Default:None
) -
prevent_destroy
(bool
) β If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be first set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces. (Default:False
) -
online_serving_enabled
(bool
) β (Optional, default True) If True, users can send realtime requests to this FeatureService, and only FeatureViews with online materialization enabled can be added to this FeatureService. (Default:True
) -
features
(Optional
[List
[Union
[FeatureReference
,FeatureView
]]]) β The list of FeatureView or FeatureReference that this FeatureService will serve. (Default:None
) -
logging
(Optional
[LoggingConfig
]) β A configuration for logging feature requests sent to this Feature Service. (Default:None
) -
on_demand_environment
(Optional
[str
]) β The environment in which all the on demand feature views for this feature service should be executed. Defaults toNone
, which means the on demand feature views are executed in the same environment as the feature server, without any resource isolation or dependencies. This may be preferred for low-latency feature services which do not have dependencies. Learn more about environments at Environments. (Default:None
) -
enable_online_caching
: (bool
) β (Optional, default False) If True, the Feature Service will attempt to retrieve a cached value of Feature Views which have caching setup.
get_feature_columns()β
Returns the list of all feature columns included in this feature service.
get_features_for_events(...)β
This method is functionally equivalent to get_historical_features(spine)
and
has been renamed in Tecton 0.8+ for clarity. get_historical_features()
will be
deprecated in a future release.
Fetch a TectonDataFrame
of feature
values from this Feature Service.
This method will return feature values for each row provided in the events
DataFrame
. The feature values returned by this method will respect the
timestamp provided in the timestamp column of the events
DataFrame
.
Parametersβ
-
events
(Union[
pyspark.sql.DataFrame
,pandas.DataFrame
,TectonDataFrame
]
) β ADataFrame
of possible join keys, request data keys, and timestamps that specify which feature values to fetch. To distinguish between columns in theevents
DataFrame
and feature columns, feature columns are labeled as feature_view_name__feature_name in the returnedDataFrame
. -
timestamp_key
(str) β Name of the time column in theevents
DataFrame
. This method will fetch the latest features computed before the specified timestamps in this column. Not applicable if the Feature Service strictly contains OnDemandFeatureViews with no feature view dependencies. (Default:None
) -
from_source
(bool) β Whether feature values should be recomputed from the original data source. IfNone
, feature values will be fetched from the Offline Store for Feature Views that have offline materialization enabled and otherwise computes feature values on the fly from raw data. Usefrom_source=True
to force computing from raw data andfrom_source=False
to error if any Feature Views are not materialized. (Default:None
) -
save
(bool) β Whether to persist theDataFrame
as a Dataset object. This parameter is not supported in Tecton on Snowflake. (Default:False
) -
save_as
(str) β Name to save theDataFrame
as. If unspecified and save=True, a name will be generated. This parameter is not supported in Tecton on Snowflake. (Default:None
) -
compute_mode
(Union
[str
,tecton.ComputeMode
,None
]) β Compute mode to use to produce theDataFrame
. Valid string values are"spark"
,"snowflake"
,"athena"
, and"rift"
.
Returnsβ
Examplesβ
A Feature Service fs
that contains a Batch Feature View and Stream Feature
View with join keys user_id
and ad_id
.
fs.get_features_for_events(events)
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'ad_id': ['a234', 'b256', 'c9102'], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetches historical features from the offline store for users 1, 2, and 3 for the
specified timestamps and ad ids in the events
DataFrame
.
fs.get_features_for_events(events, save_as='my_dataset)
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'ad_id': ['a234', 'b256', 'c9102'], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetches historical features from the offline store for users 1, 2, and 3 for the
specified timestamps and ad ids in the events
DataFrame
. Save the
DataFrame
as dataset with the name my_dataset
.
fv.get_features_for_events(events, timestamp_key='date_1')
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'ad_id': ['a234', 'b256', 'c9102'], 'date_1': [datetime(...), ...], 'date_2': [datetime(...), ...]})
Fetches historical features from the offline store for users 1, 2, and 3 for the
ad ids and specified timestamps in the date_1
column in the events
DataFrame
.
A Feature Service fs_on_demand
that contains only OnDemandFeatureViews and
expects request time data for the key amount
.
The request time data is defined in the feature definition as such:
request_schema = StructType()
request_schema.add(StructField("amount", DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)
fs_on_demand.get_features_for_events(events)
whereevents=pandas.Dataframe({'amount': [30, 50, 10000]})
Fetches historical features from the offline store with request data inputs 30, 50, and 10000.
A Feature Service fs_all
that contains feature views of all types with join
key βuser_idβ and expects request time data for the key amount
.
fs_all.get_features_for_events(events)
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'amount': [30, 50, 10000], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetches historical features from the offline store for users 1, 2, and 3 for the
specified timestamps and request data inputs in the events
data.
get_historical_features(...)β
Fetch a TectonDataFrame
of feature
values from this FeatureService.
This method will return feature values for each row provided in the spine DataFrame. The feature values returned by this method will respect the timestamp provided in the timestamp column of the spine DataFrame.
By default (i.e. from_source=None
), this method fetches feature values from
the Offline Store for Feature Views that have offline materialization enabled
and otherwise computes feature values on the fly from raw data.
Parametersβ
-
spine
(Union[
pyspark.sql.DataFrame
,pandas.DataFrame
,TectonDataFrame
]
) β A dataframe of possible join keys, request data keys, and timestamps that specify which feature values to fetch.To distinguish between spine columns and feature columns, feature columns are labeled as feature_view_name__feature_name in the returned DataFrame. -
timestamp_key
(str) β Name of the time column in the spine DataFrame. This method will fetch the latest features computed before the specified timestamps in this column. Not applicable if the FeatureService strictly contains OnDemandFeatureViews with no feature view dependencies. (Default:None
) -
include_feature_view_timestamp_columns
(bool) β Whether to include timestamp columns for each FeatureView in the FeatureService. (Default:False
) -
from_source
(bool) β Whether feature values should be recomputed from the original data source. IfNone
, feature values will be fetched from the Offline Store for Feature Views that have offline materialization enabled and otherwise computes feature values on the fly from raw data. Usefrom_source=True
to force computing from raw data andfrom_source=False
to error if any Feature Views are not materialized. (Default:None
) -
save
(bool) β Whether to persist the DataFrame as a Dataset object. This parameter is not supported in Tecton on Snowflake. (Default:False
) -
save_as
(str) β Name to save the DataFrame as. If unspecified and save=True, a name will be generated. This parameter is not supported in Tecton on Snowflake. (Default:None
) -
compute_mode
(Union
[str
,tecton.ComputeMode
,None
]) β Compute mode to use to produce the data frame. Valid string values are"spark"
,"snowflake"
,"athena"
, and"rift"
.
Returnsβ
Examplesβ
A FeatureService fs
that contains a BatchFeatureView and StreamFeatureView
with join keys user_id
and ad_id
.
fs.get_historical_features(spine)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'ad_id': ['a234', 'b256', 'c9102'], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps and ad ids in the spine.
fs.get_historical_features(spine, save_as='my_dataset)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'ad_id': ['a234', 'b256', 'c9102'], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the
specified timestamps and ad ids in the spine. Save the DataFrame as dataset with
the name my_dataset
.
fv.get_historical_features(spine, timestamp_key='date_1')
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'ad_id': ['a234', 'b256', 'c9102'], 'date_1': [datetime(...), ...], 'date_2': [datetime(...), ...]})
Fetch historical features from the offline store for users 1, 2, and 3 for the
ad ids and specified timestamps in the date_1
column in the spine.
A FeatureService fs_on_demand
that contains only OnDemandFeatureViews and
expects request time data for the key amount
.
The request time data is defined in the feature definition as such:
request_schema = StructType()
request_schema.add(StructField("amount", DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)
fs_on_demand.get_historical_features(spine)
wherespine=pandas.Dataframe({'amount': [30, 50, 10000]})
Fetch historical features from the offline store with request data inputs 30, 50, and 10000.
A FeatureService fs_all
that contains feature views of all types with join key
βuser_idβ and expects request time data for the key amount
.
fs_all.get_historical_features(spine)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'amount': [30, 50, 10000], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps and request data inputs in the spine.
get_online_features(...)β
Returns a single Tecton FeatureVector
from the Online Store. At least one of join_keys or request_data is required.
Parametersβ
-
join_keys
(Optional
[Mapping
[str
,Union
[int
,int64
,str
,bytes
]]]) β Join keys of the enclosed FeatureViews. (Default:None
) -
include_join_keys_in_response
(bool
) β Whether to include join keys as part of the response FeatureVector. (Default:False
) -
request_data
(Optional
[Mapping
[str
,Union
[int
,int64
,str
,bytes
,float
]]]) β Dictionary of request context values. Only applicable when the FeatureService contains OnDemandFeatureViews. (Default:None
)
Returnsβ
A FeatureVector
of the results.
Examplesβ
A FeatureService fs
that contains a BatchFeatureView and StreamFeatureView
with join keys user_id
and ad_id
.
-
fs.get_online_features(join_keys={'user_id': 1, 'ad_id': 'c234'})
Fetch the latest features from the online store for user 1 and ad βc234β. -
fv.get_online_features(join_keys={'user_id': 1, 'ad_id': 'c234'}, include_join_keys_in_response=True)
Fetch the latest features from the online store for user 1 and ad id βc234β. Include the join key information (user_id=1, ad_id=βc234β) in the returned FeatureVector.
A FeatureService fs_on_demand
that contains only OnDemandFeatureViews and
expects request time data for key amount
.
The request time data is defined in the feature definition as such:
request_schema = StructType()
request_schema.add(StructField("amount", DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)
fs_on_demand.get_online_features(request_data={'amount': 30})
Fetch the latest features from the online store with amount = 30.
A FeatureService fs_all
that contains feature views of all types with join key
user_id
and expects request time data for key amount
.
fs_all.get_online_features(join_keys={'user_id': 1}, request_data={'amount': 30})
Fetch the latest features from the online store for user 1 with amount = 30.
query_features(...)β
[Advanced Feature] Queries the FeatureService with a partial set of join_keys
defined in the online_serving_index
of the included FeatureViews. Returns a
TectonDataFrame
of all matched records.
Parametersβ
join_keys
(Mapping
[str
,Union
[int
,int64
,str
,bytes
]]) β Query join keys, i.e., a union of join keys in theonline_serving_index
of all enclosed FeatureViews.
Returnsβ
summary()β
Displays a human readable summary of this Feature View.
validate()β
Validate this Tecton object and its dependencies (if any).
Validation performs most of the same checks and operations as tecton plan
.
-
Check for invalid object configurations, e.g. setting conflicting fields.
-
For Data Sources and Feature Views, test query code and derive schemas. e.g. test that a Data Sourceβs specified s3 path exists or that a Feature Viewβs SQL code executes and produces supported feature data types.
Objects already applied to Tecton do not need to be re-validated on retrieval
(e.g. fv = tecton.get_workspace('prod').get_feature_view('my_fv')
) since they
have already been validated during tecton plan
. Locally defined objects (e.g.
my_ds = BatchSource(name="my_ds", ...)
) may need to be validated before some
of their methods can be called, e.g.
my_feature_view.get_features_for_events()
.