FeatureTable
Summaryβ
A Tecton Feature Table.Β
Feature Tables are used to batch push features into Tecton from external feature computation systems.
Example
from tecton import Entity, FeatureTablefrom tecton.types import Field, String, Timestamp, Int64import datetime# Declare your user Entity instance here or import it if defined elsewhere in# your Tecton repo.user = ...features = [Attribute('user_login_count_7d', Int64),Attribute('user_login_count_30d', Int64)]user_login_counts = FeatureTable(name='user_login_counts',entities=[user],features=features,online=True,offline=True,ttl=datetime.timedelta(days=30),timestamp_key='timestamp')
FeatureTable (Class)β
Attributesβ
Name | Data Type | Description |
---|---|---|
alert_email | Optional[str] | Email that alerts for this Feature Table will be sent to. |
created_at | Optional[datetime.datetime] | Returns the time that this Tecton object was created or last updated. None for locally defined objects. |
defined_in | Optional[str] | The repo filename where this object was declared. None for locally defined objects. |
description | Optional[str] | Returns the description of the Tecton object. |
entities | ||
feature_metadata | List[FeatureMetadata] | |
id | str | Returns the unique id of the Tecton object. |
info | ||
join_keys | List[str] | The join key column names. |
name | str | Returns the name of the Tecton object. |
offline | bool | Whether the Feature Table is materialized to the offline feature store. |
offline_store | Optional[Union[configs.DeltaConfig, configs.ParquetConfig]] | Configuration for the Offline Store of this Feature Table. |
online | bool | Whether the Feature Table is materialized to the online feature store. |
online_serving_index | List[str] | The set of join keys that will be indexed and queryable during online serving. Β defaults to the complete set of join keys. |
owner | Optional[str] | Returns the owner of the Tecton object. |
prevent_destroy | bool | If set to True, Tecton will block destructive actions taken on this Feature View or Feature Table. |
tags | Dict[str, str] | Returns the tags of the Tecton object. |
tecton_materialization_runtime | Optional[str] | Version of tecton package used by your job cluster. |
timestamp_field | Optional[str] | The column name that refers to the timestamp for records that are produced by the Feature Table. This parameter is optional if exactly one column is a Timestamp type. |
ttl | Duration | TTL defines feature lifespan and look-back window for training sets. |
url | str | Returns a link to the Tecton Web UI. |
wildcard_join_key | Optional[set] | Returns a wildcard join key column name if it exists; Otherwise returns None. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. None for locally defined objects. |
Methodsβ
Name | Description |
---|---|
__init__(...) | Instantiate a new Feature Table. |
delete_keys(...) | Deletes any materialized data that matches the specified join keys from the Feature Table. |
get_feature_columns() | Retrieves the list of feature columns produced by this FeatureView. |
get_features_for_events(...) | Returns a TectonDataFrame of historical values for this feature table. |
get_features_in_range(...) | Returns a TectonDataFrame of historical values for this feature table. |
get_historical_features(...) | Returns a TectonDataFrame of historical values for this feature table. |
get_online_features(...) | Returns a single Tecton FeatureVector from the Online Store. |
get_timestamp_field() | Returns the name of the timestamp field of this Feature Table. |
ingest(...) | Ingests a Dataframe into the Feature Table. |
summary() | Displays a human-readable summary. |
validate() | Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary. |
with_join_key_map(...) | Rebind join keys for a Feature View or Feature Table used in a Feature Service. |
with_name(...) | Rename a Feature View or Feature Table used in a Feature Service. |
__init__(...)β
Instantiate a new Feature Table.Parameters
name
(str
) - Unique, human friendly name that identifies the Feature Table.description
(Optional
[str
]) - A human-readable description. Default:None
owner
(Optional
[str
]) - Owner name (typically the email of the primary maintainer). Default:None
tags
(Optional
[Dict
[str
,str
]]) - Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). Default:None
prevent_destroy
(bool
) - If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object,prevent_destroy
must be set to False via the same tecton apply or a separate tecton apply.prevent_destroy
can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature Table that triggers expensive rematerialization jobs.prevent_destroy
also blocks changes to dependent Tecton objects that would trigger a recreation of the tagged object, e.g. ifprevent_destroy
is set on a Feature Service, that will also prevent deletions or re-creates of Feature Tables used in that service.prevent_destroy
is only enforced in live (i.e. non-dev) workspaces. Default:false
entities
(List
[framework_entity.Entity
]) - A list of Entity objects, used to organize features.features
(List
[feature.Attribute
]) - A list of features this Feature Table manages. Only one of schema or features can be set.ttl
(Optional
[datetime.timedelta
]) - The TTL (or "look back window") for features defined by this feature table. This parameter determines how long features will live in the online store and how far to "look back" relative to a training example's timestamp when generating offline training sets. Shorter TTLs improve performance and reduce costs. Default:None
online
(bool
) - Enable writing to online feature store. Default:false
offline
(bool
) - Enable writing to offline feature store. Default:false
offline_store
(Optional
[Union
[configs.OfflineStoreConfig
,configs.DeltaConfig
]]) - Configuration for how data is written to the offline feature store. Default:None
online_store
(Optional
[configs.OnlineStoreTypes
]) - Configuration for how data is written to the online feature store. Default:None
batch_compute
(Optional
[configs.ComputeConfigTypes
]) - Configuration for batch materialization clusters. Should be one of: [EMRClusterConfig
,DatabricksClusterConfig
,EMRJsonClusterConfig
,DatabricksJsonClusterConfig
,DataprocJsonClusterConfig
] Default:None
online_serving_index
(Optional
[List
[str
]]) - (Advanced) Defines the set of join keys that will be indexed and queryable during online serving. Defaults to the complete set of join keys. Up to one join key may be omitted. If one key is omitted, online requests to a Feature Service will return all feature vectors that match the specified join keys. Default:None
alert_email
(Optional
[str
]) - Email that alerts for this Feature Table will be sent to. Default:None
tecton_materialization_runtime
(Optional
[str
]) - Version oftecton
package used by your job cluster. Default:None
timestamp_field
(str
) - The column name that refers to the timestamp for records that are produced by the Feature Table. This parameter is optional only if using the schema parameter rather than features.cache_config
(Optional
[configs.CacheConfig
]) - Cache config for the Feature Table. Including this option enables the feature server to use the cache when retrieving features for this feature table. Will only be respected if the feature service containing this feature table hasenable_online_caching
set toTrue
. Default:None
delete_keys(...)β
Deletes any materialized data that matches the specified join keys from the Feature Table.Β
This method kicks off a job to delete the data in the offline and online stores. If a Feature Table has multiple entities, the full set of join keys must be specified. Only supports Dynamo online store. Maximum 500,000 keys can be deleted per request.
Parameters
keys
(Union
[pyspark_dataframe.DataFrame
,pandas.DataFrame
]) - The Dataframe to be deleted. Must conform to the Feature Table join keys.online
(bool
) - Whether or not to delete from the online store. Default:true
offline
(bool
) - Whether or not to delete from the offline store. Default:true
Returns
List
[str
]: List of job ids for jobs created for entity deletion.
get_feature_columns(...)β
Retrieves the list of feature columns produced by this FeatureView.Returns
List
[str
]: The features produced by this FeatureView.
get_features_for_events(...)β
This method is functionally equivalent to get_historical_features(spine)
and
has been renamed in Tecton 0.8 for clarity. get_historical_features()
is
planned to be deprecated in a future release.
TectonDataFrame
of historical values for this feature table.Β
If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.
Β
Note: The
timestamp_key
parameter is only applicable when a spine is passed in.
Parameters start_time
, end_time
, and entities
are only applicable when a spine is not passed in.Β
Examples: A FeatureView
fv
with join key user_id
.
fv.get_features_for_events(spine)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine.
fv.get_features_for_events(spine, timestamp_key='date_1')
where spine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})` Fetch features from the offline store for users 1, 2, and 3 for the specified timestamps in the 'date_1' column in the spine.
Parameters
spine
(Union
[pyspark_dataframe.DataFrame
,pandas.DataFrame
,TectonDataFrame
]) - The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled asfeature_view_name.feature_name
in the returned DataFrame. If spine is not specified, it'll return a DataFrame of feature values in the specified time range.timestamp_key
(Optional
[str
]) - Name of the time column in spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. Default:None
compute_mode
(Optional
[Union
[ComputeMode
,str
]]) - Compute mode to use to produce the data frame. Default:None
Returns
TectonDataFrame
: A TectonDataFrame with features values.
Examplesβ
A FeatureTableft
with join key user_id
.
-
ft.get_features_for_events(events)
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in theevents
dataframe. -
ft.get_features_for_events(events, save_as='my_dataset)
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in theevents
dataframe. Save the DataFrame as dataset with the namemy_dataset
. -
ft.get_features_for_events(events, timestamp_key='date_1')
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the βdate_1β column in theevents
dataframe.
get_features_in_range(...)β
Returns aTectonDataFrame
of historical values for this feature table.Β
If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.
Parameters
start_time
(datetime.datetime
) - The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC.end_time
(datetime.datetime
) - The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC.max_lookback
(Optional
[datetime.timedelta
]) - [Non-Aggregate Feature Tables Only] A performance optimization that configures how far back before start_time to look for events in the raw data. If set, get_features_in_range() may not include all entities with valid feature values in the specified time range, but get_features_in_range() will never return invalid values. Default:None
entities
(Optional
[Union
[pyspark_dataframe.DataFrame
,pandas.DataFrame
,TectonDataFrame
]]) - A DataFrame that is used to filter down feature values. If specified, this DataFrame should only contain join key columns. Default:None
compute_mode
(Optional
[Union
[ComputeMode
,str
]]) - Compute mode to use to produce the data frame. Default:None
Returns
TectonDataFrame
: A TectonDataFrame with features values.
get_historical_features(...)β
The timestamp_key parameter is only applicable when a spine is passed in. Parameters start_time, end_time, and entities are only applicable when a spine is not passed in.
The ability to run get_historical_features
as part of a unit test was added in
SDK 0.7. To utilize this, provide the mocked data sources in the mock_inputs
parameter in a test that is run via tecton test
or pytest
.
TectonDataFrame
of historical values for this feature table.Β
If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.
Β
Note: The
timestamp_key
parameter is only applicable when a spine is passed in.
Parameters start_time
, end_time
, and entities
are only applicable when a spine is not passed in.Parameters
spine
(Optional
[Union
[pyspark_dataframe.DataFrame
,pandas.DataFrame
,TectonDataFrame
]]) - The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled asfeature_view_name.feature_name
in the returned DataFrame. If spine is not specified, it'll return a DataFrame of feature values in the specified time range. Default:None
timestamp_key
(Optional
[str
]) - Name of the time column in spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. Default:None
entities
(Optional
[Union
[pyspark_dataframe.DataFrame
,pandas.DataFrame
,TectonDataFrame
]]) - A DataFrame that is used to filter down feature values. If specified, this DataFrame should only contain join key columns. Default:None
start_time
(Optional
[datetime.datetime
]) - The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC. Default:None
end_time
(Optional
[datetime.datetime
]) - The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC. Default:None
compute_mode
(Optional
[Union
[ComputeMode
,str
]]) - Compute mode to use to produce the data frame. Default:None
Returns
TectonDataFrame
: A TectonDataFrame with features values.
Examplesβ
A FeatureTable ft
with join key user_id
.
-
ft.get_historical_features(spine)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. -
ft.get_historical_features(spine, save_as='my_dataset)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. Save the DataFrame as dataset with the namemy_dataset
. -
ft.get_historical_features(spine, timestamp_key='date_1')
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the βdate_1β column in the spine. -
ft.get_historical_features(start_time=datetime(...), end_time=datetime(...))
Fetch all historical features from the offline store in the time range specified by start_time and end_time.
get_online_features(...)β
Returns a single Tecton FeatureVector from the Online Store.Parameters
join_keys
(Mapping
[str
,Union
[int
,numpy.int_
,str
,bytes
]]) - Join keys of the enclosed Feature Table.include_join_keys_in_response
(bool
) - Whether to include join keys as part of the response FeatureVector. Default:false
Returns
FeatureVector
: A FeatureVector of the results.
Examplesβ
A FeatureTable ft
with join key user_id
.
-
ft.get_online_features(join_keys={'user_id': 1})
Fetch the latest features from the online store for user 1. -
ft.get_online_features(join_keys={'user_id': 1}, include_join_keys_in_response=True)
Fetch the latest features from the online store for user 1 and include the join key information (user_id=1) in the returned FeatureVector.
get_timestamp_field()β
Returns the name of the timestamp field of this Feature Table.Returns
str
ingest(...)β
Ingests a Dataframe into the Feature Table.Β
This method kicks off a materialization job to write the data into the offline and online store, depending on the Feature Table configuration.
Parameters
df
(Union
[pyspark_dataframe.DataFrame
,pandas.DataFrame
]) - The Dataframe to be ingested. Has to conform to the Feature Table schema.
summary()β
Displays a human-readable summary.with_join_key_map()β
Rebind join keys for a Feature View or Feature Table used in a Feature Service.Β
The keys in join_key_map should be the join keys, and the values should be the feature service overrides.
Parameters
join_key_map
(Dict
[str
,str
]) - Dictionary remapping the join key names. Dictionary keys are join keys, values are the feature service override values.
Returns
FeatureReference
Example
from tecton import FeatureService# The join key for this feature service will be "feature_service_user_id".feature_service = FeatureService(name="feature_service",features=[my_feature_view.with_join_key_map({"user_id" : "feature_service_user_id"}),],)# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",# "sender_features", and "recipient_features".transaction_fraud_service = FeatureService(name="transaction_fraud_service",features=[# Select a subset of features from a feature view.transaction_features[["amount"]],# Rename a feature view and/or rebind its join keys. In this example, we want user features for both the# transaction sender and recipient, so include the feature view twice and bind it to two different feature# service join keys.user_features.with_name("sender_features").with_join_key_map({"user_id" : "sender_id"}),user_features.with_name("recipient_features").with_join_key_map({"user_id" : "recipient_id"}),],)
with_name()β
Rename a Feature View or Feature Table used in a Feature Service.Parameters
namespace
(str
) - The namespace used to prefix the features joined from this FeatureView. By default, namespace is set to the FeatureView name.
Returns
FeatureReference
Examples
from tecton import FeatureService# The feature view in this feature service will be named "new_named_feature_view" in training data dataframe# columns and other metadata.feature_service = FeatureService(name="feature_service",features=[my_feature_view.with_name("new_named_feature_view")],)
# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",# "sender_features", and "recipient_features".transaction_fraud_service = FeatureService(name="transaction_fraud_service",features=[# Select a subset of features from a feature view.transaction_features[["amount"]],# Rename a feature view and/or rebind its join keys. In this example, we want user features for both the# transaction sender and recipient, so include the feature view twice and bind it to two different feature# service join keys.user_features.with_name("sender_features").with_join_key_map({"user_id" : "sender_id"}),user_features.with_name("recipient_features").with_join_key_map({"user_id" : "recipient_id"}),],)
validate()β
Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.Returns
None