Skip to main content
Version: Beta 🚧

FeatureTable

Summary​

A Tecton Feature Table.
Β 
Feature Tables are used to batch push features into Tecton from external feature computation systems.

Example

from tecton import Entity, FeatureTable
from tecton.types import Field, String, Timestamp, Int64
import datetime
# Declare your user Entity instance here or import it if defined elsewhere in
# your Tecton repo.
user = ...
features = [
Attribute('user_login_count_7d', Int64),
Attribute('user_login_count_30d', Int64)
]
user_login_counts = FeatureTable(
name='user_login_counts',
entities=[user],
features=features,
online=True,
offline=True,
ttl=datetime.timedelta(days=30),
timestamp_key='timestamp'
)

FeatureTable (Class)​

Attributes​

NameData TypeDescription
alert_emailOptional[str]Email that alerts for this Feature Table will be sent to.
created_atOptional[datetime.datetime]Returns the time that this Tecton object was created or last updated. None for locally defined objects.
defined_inOptional[str]The repo filename where this object was declared. None for locally defined objects.
descriptionOptional[str]Returns the description of the Tecton object.
entities
feature_metadataList[FeatureMetadata]
idstrReturns the unique id of the Tecton object.
info
join_keysList[str]The join key column names.
namestrReturns the name of the Tecton object.
offlineboolWhether the Feature Table is materialized to the offline feature store.
offline_storeOptional[Union[configs.DeltaConfig, configs.ParquetConfig]]Configuration for the Offline Store of this Feature Table.
onlineboolWhether the Feature Table is materialized to the online feature store.
online_serving_indexList[str]The set of join keys that will be indexed and queryable during online serving.
Β 
defaults to the complete set of join keys.
ownerOptional[str]Returns the owner of the Tecton object.
prevent_destroyboolIf set to True, Tecton will block destructive actions taken on this Feature View or Feature Table.
tagsDict[str, str]Returns the tags of the Tecton object.
tecton_materialization_runtimeOptional[str]Version of tecton package used by your job cluster.
timestamp_fieldOptional[str]The column name that refers to the timestamp for records that are produced by the Feature Table. This parameter is optional if exactly one column is a Timestamp type.
ttlDurationTTL defines feature lifespan and look-back window for training sets.
urlstrReturns a link to the Tecton Web UI.
wildcard_join_keyOptional[set]Returns a wildcard join key column name if it exists; Otherwise returns None.
workspaceOptional[str]Returns the workspace that this Tecton object belongs to. None for locally defined objects.

Methods​

NameDescription
__init__(...)Instantiate a new Feature Table.
delete_keys(...)Deletes any materialized data that matches the specified join keys from the Feature Table.
get_feature_columns()Retrieves the list of feature columns produced by this FeatureView.
get_features_for_events(...)Returns a TectonDataFrame of historical values for this feature table.
get_features_in_range(...)Returns a TectonDataFrame of historical values for this feature table.
get_historical_features(...)Returns a TectonDataFrame of historical values for this feature table.
get_online_features(...)Returns a single Tecton FeatureVector from the Online Store.
get_timestamp_field()Returns the name of the timestamp field of this Feature Table.
ingest(...)Ingests a Dataframe into the Feature Table.
summary()Displays a human-readable summary.
validate()Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.
with_join_key_map(...)Rebind join keys for a Feature View or Feature Table used in a Feature Service.
with_name(...)Rename a Feature View or Feature Table used in a Feature Service.

__init__(...)​

Instantiate a new Feature Table.

Parameters

  • name (str) - Unique, human friendly name that identifies the Feature Table.

  • description (Optional[str]) - A human-readable description. Default: None

  • owner (Optional[str]) - Owner name (typically the email of the primary maintainer). Default: None

  • tags (Optional[Dict[str, str]]) - Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). Default: None

  • prevent_destroy (bool) - If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature Table that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreation of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Tables used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces. Default: false

  • entities (List[framework_entity.Entity]) - A list of Entity objects, used to organize features.

  • features (List[feature.Attribute]) - A list of features this Feature Table manages.

  • timestamp_field (str) - The column name that refers to the timestamp for records that are produced by the Feature Table.

  • ttl (Optional[datetime.timedelta]) - The TTL (or "look back window") for features defined by this feature table. This parameter determines how long features will live in the online store and how far to "look back" relative to a training example's timestamp when generating offline training sets. Shorter TTLs improve performance and reduce costs. Default: None

  • online (bool) - Enable writing to online feature store. Default: false

  • offline (bool) - Enable writing to offline feature store. Default: false

  • offline_store (Optional[Union[configs.OfflineStoreConfig, configs.DeltaConfig]]) - Configuration for how data is written to the offline feature store. Default: None

  • online_store (Optional[configs.OnlineStoreTypes]) - Configuration for how data is written to the online feature store. Default: None

  • batch_compute (Optional[configs.ComputeConfigTypes]) - Configuration for batch materialization clusters. Should be one of: [EMRClusterConfig, DatabricksClusterConfig, EMRJsonClusterConfig, DatabricksJsonClusterConfig, DataprocJsonClusterConfig] Default: None

  • online_serving_index (Optional[List[str]]) - (Advanced) Defines the set of join keys that will be indexed and queryable during online serving. Defaults to the complete set of join keys. Up to one join key may be omitted. If one key is omitted, online requests to a Feature Service will return all feature vectors that match the specified join keys. Default: None

  • alert_email (Optional[str]) - Email that alerts for this Feature Table will be sent to. Default: None

  • tecton_materialization_runtime (Optional[str]) - Version of tecton package used by your job cluster. Default: None

  • cache_config (Optional[configs.CacheConfig]) - Cache config for the Feature Table. Including this option enables the feature server to use the cache when retrieving features for this feature table. Will only be respected if the feature service containing this feature table has enable_online_caching set to True. Default: None

  • options (Optional[Dict[str, str]]) - Additional options to configure the Feature Table. Used for advanced use cases and beta features. Default: None

delete_keys(...)​

Deletes any materialized data that matches the specified join keys from the Feature Table.
Β 
This method kicks off a job to delete the data in the offline and online stores. If a Feature Table has multiple entities, the full set of join keys must be specified. Only supports Dynamo online store. Maximum 500,000 keys can be deleted per request.

Parameters

  • keys (Union[pyspark_dataframe.DataFrame, pandas.DataFrame]) - The Dataframe to be deleted. Must conform to the Feature Table join keys.

  • online (bool) - Whether or not to delete from the online store. Default: true

  • offline (bool) - Whether or not to delete from the offline store. Default: true

Returns

List[str]: List of job ids for jobs created for entity deletion.

get_feature_columns(...)​

Retrieves the list of feature columns produced by this FeatureView.

Returns

List[str]: The features produced by this FeatureView.

get_features_for_events(...)​

info

This method is functionally equivalent to get_historical_features(spine) and has been renamed in Tecton 0.8 for clarity. get_historical_features() is planned to be deprecated in a future release.

Returns a TectonDataFrame of historical values for this feature table.
Β 
If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.
Β 
Note: The timestamp_key parameter is only applicable when a spine is passed in. Parameters start_time, end_time, and entities are only applicable when a spine is not passed in.
Β 
Examples: A FeatureView fv with join key user_id.
  1. fv.get_features_for_events(spine) where spine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine.
  1. fv.get_features_for_events(spine, timestamp_key='date_1') where spine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})` Fetch features from the offline store for users 1, 2, and 3 for the specified timestamps in the 'date_1' column in the spine.

Parameters

  • spine (Union[pyspark_dataframe.DataFrame, pandas.DataFrame, TectonDataFrame]) - The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled as feature_view_name.feature_name in the returned DataFrame. If spine is not specified, it'll return a DataFrame of feature values in the specified time range.

  • timestamp_key (Optional[str]) - Name of the time column in spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. Default: None

  • compute_mode (Optional[Union[ComputeMode, str]]) - Compute mode to use to produce the data frame. Default: None

Returns

TectonDataFrame: A TectonDataFrame with features values.

Examples​

A FeatureTableft with join key user_id.

  1. ft.get_features_for_events(events) where events=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the events dataframe.

  2. ft.get_features_for_events(events, save_as='my_dataset) where events=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the events dataframe. Save the DataFrame as dataset with the name my_dataset.

  3. ft.get_features_for_events(events, timestamp_key='date_1') where events=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the β€˜date_1’ column in the events dataframe.

get_features_in_range(...)​

Returns a TectonDataFrame of historical values for this feature table.
Β 
If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.

Parameters

  • start_time (datetime.datetime) - The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC.

  • end_time (datetime.datetime) - The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC.

  • max_lookback (Optional[datetime.timedelta]) - [Non-Aggregate Feature Tables Only] A performance optimization that configures how far back before start_time to look for events in the raw data. If set, get_features_in_range() may not include all entities with valid feature values in the specified time range, but get_features_in_range() will never return invalid values. Default: None

  • entities (Optional[Union[pyspark_dataframe.DataFrame, pandas.DataFrame, TectonDataFrame]]) - A DataFrame that is used to filter down feature values. If specified, this DataFrame should only contain join key columns. Default: None

  • compute_mode (Optional[Union[ComputeMode, str]]) - Compute mode to use to produce the data frame. Default: None

Returns

TectonDataFrame: A TectonDataFrame with features values.

get_historical_features(...)​

info

The timestamp_key parameter is only applicable when a spine is passed in. Parameters start_time, end_time, and entities are only applicable when a spine is not passed in.

info

The ability to run get_historical_features as part of a unit test was added in SDK 0.7. To utilize this, provide the mocked data sources in the mock_inputs parameter in a test that is run via tecton test or pytest.

Returns a TectonDataFrame of historical values for this feature table.
Β 
If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.
Β 
Note: The timestamp_key parameter is only applicable when a spine is passed in. Parameters start_time, end_time, and entities are only applicable when a spine is not passed in.

Parameters

  • spine (Optional[Union[pyspark_dataframe.DataFrame, pandas.DataFrame, TectonDataFrame]]) - The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled as feature_view_name.feature_name in the returned DataFrame. If spine is not specified, it'll return a DataFrame of feature values in the specified time range. Default: None

  • timestamp_key (Optional[str]) - Name of the time column in spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. Default: None

  • entities (Optional[Union[pyspark_dataframe.DataFrame, pandas.DataFrame, TectonDataFrame]]) - A DataFrame that is used to filter down feature values. If specified, this DataFrame should only contain join key columns. Default: None

  • start_time (Optional[datetime.datetime]) - The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC. Default: None

  • end_time (Optional[datetime.datetime]) - The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC. Default: None

  • compute_mode (Optional[Union[ComputeMode, str]]) - Compute mode to use to produce the data frame. Default: None

Returns

TectonDataFrame: A TectonDataFrame with features values.

Examples​

A FeatureTable ft with join key user_id.

  1. ft.get_historical_features(spine) where spine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine.

  2. ft.get_historical_features(spine, save_as='my_dataset) where spine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. Save the DataFrame as dataset with the name my_dataset.

  3. ft.get_historical_features(spine, timestamp_key='date_1') where spine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the β€˜date_1’ column in the spine.

  4. ft.get_historical_features(start_time=datetime(...), end_time=datetime(...)) Fetch all historical features from the offline store in the time range specified by start_time and end_time.

get_online_features(...)​

Returns a single Tecton FeatureVector from the Online Store.

Parameters

  • join_keys (Mapping[str, Union[int, numpy.int_, str, bytes]]) - Join keys of the enclosed Feature Table.

  • include_join_keys_in_response (bool) - Whether to include join keys as part of the response FeatureVector. Default: false

Returns

FeatureVector: A FeatureVector of the results.

Examples​

A FeatureTable ft with join key user_id.

  1. ft.get_online_features(join_keys={'user_id': 1}) Fetch the latest features from the online store for user 1.

  2. ft.get_online_features(join_keys={'user_id': 1}, include_join_keys_in_response=True) Fetch the latest features from the online store for user 1 and include the join key information (user_id=1) in the returned FeatureVector.

get_timestamp_field()​

Returns the name of the timestamp field of this Feature Table.

Returns

str

ingest(...)​

Ingests a Dataframe into the Feature Table.
Β 
This method kicks off a materialization job to write the data into the offline and online store, depending on the Feature Table configuration.

Parameters

summary()​

Displays a human-readable summary.

with_join_key_map()​

Rebind join keys for a Feature View or Feature Table used in a Feature Service.
Β 
The keys in join_key_map should be the join keys, and the values should be the feature service overrides.

Parameters

  • join_key_map (Dict[str, str]) - Dictionary remapping the join key names. Dictionary keys are join keys, values are the feature service override values.

Returns

FeatureReference

Example

from tecton import FeatureService
# The join key for this feature service will be "feature_service_user_id".
feature_service = FeatureService(
name="feature_service",
features=[
my_feature_view.with_join_key_map({"user_id" : "feature_service_user_id"}),
],
)
# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",
# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",
# "sender_features", and "recipient_features".
transaction_fraud_service = FeatureService(
name="transaction_fraud_service",
features=[
# Select a subset of features from a feature view.
transaction_features[["amount"]],
# Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
# transaction sender and recipient, so include the feature view twice and bind it to two different feature
# service join keys.
user_features.with_name("sender_features").with_join_key_map({"user_id" : "sender_id"}),
user_features.with_name("recipient_features").with_join_key_map({"user_id" : "recipient_id"}),
],
)

with_name()​

Rename a Feature View or Feature Table used in a Feature Service.

Parameters

  • namespace (str) - The namespace used to prefix the features joined from this FeatureView. By default, namespace is set to the FeatureView name.

Returns

FeatureReference

Examples

from tecton import FeatureService
# The feature view in this feature service will be named "new_named_feature_view" in training data dataframe
# columns and other metadata.
feature_service = FeatureService(
name="feature_service",
features=[
my_feature_view.with_name("new_named_feature_view")
],
)
# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",
# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",
# "sender_features", and "recipient_features".
transaction_fraud_service = FeatureService(
name="transaction_fraud_service",
features=[
# Select a subset of features from a feature view.
transaction_features[["amount"]],
# Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
# transaction sender and recipient, so include the feature view twice and bind it to two different feature
# service join keys.
user_features.with_name("sender_features").with_join_key_map({"user_id" : "sender_id"}),
user_features.with_name("recipient_features").with_join_key_map({"user_id" : "recipient_id"}),
],
)

validate()​

Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.

Returns

None

Was this page helpful?