Version: 1.1

FeatureTable

Summary

A Tecton Feature Table.

Feature Tables are used to batch push features into Tecton from external feature computation systems.

Example

from tecton import Entity, FeatureTable
from tecton.types import Field, String, Timestamp, Int64
import datetime

# Declare your user Entity instance here or import it if defined elsewhere in
# your Tecton repo.

user = ...

features = [
    Attribute('user_login_count_7d', Int64),
    Attribute('user_login_count_30d', Int64)
]

user_login_counts = FeatureTable(
    name='user_login_counts',
    entities=[user],
    features=features,
    online=True,
    offline=True,
    ttl=datetime.timedelta(days=30),
    timestamp_key='timestamp'
)

FeatureTable (Class)

Attributes

Name	Data Type	Description
`alert_email`	`Optional[str]`	Email that alerts for this Feature Table will be sent to.
`created_at`	`Optional[datetime.datetime]`	Returns the time that this Tecton object was created or last updated. `None` for locally defined objects.
`defined_in`	`Optional[str]`	The repo filename where this object was declared. `None` for locally defined objects.
`description`	`Optional[str]`	Returns the description of the Tecton object.
`entities`
`feature_metadata`	`List[FeatureMetadata]`
`id`	`str`	Returns the unique id of the Tecton object.
`info`
`join_keys`	`List[str]`	The join key column names.
`name`	`str`	Returns the name of the Tecton object.
`offline`	`bool`	Whether the Feature Table is materialized to the offline feature store.
`offline_store`	`Optional[Union[configs.DeltaConfig, configs.ParquetConfig]]`	Configuration for the Offline Store of this Feature Table.
`online`	`bool`	Whether the Feature Table is materialized to the online feature store.
`online_serving_index`	`List[str]`	The set of join keys that will be indexed and queryable during online serving. defaults to the complete set of join keys.
`owner`	`Optional[str]`	Returns the owner of the Tecton object.
`prevent_destroy`	`bool`	If set to True, Tecton will block destructive actions taken on this Feature View or Feature Table.
`tags`	`Dict[str, str]`	Returns the tags of the Tecton object.
`tecton_materialization_runtime`	`Optional[str]`	Version of `tecton` package used by your job cluster.
`timestamp_field`	`Optional[str]`	The column name that refers to the timestamp for records that are produced by the Feature Table. This parameter is optional if exactly one column is a Timestamp type.
`ttl`	`Duration`	TTL defines feature lifespan and look-back window for training sets.
`url`	`str`	Returns a link to the Tecton Web UI.
`wildcard_join_key`	`Optional[set]`	Returns a wildcard join key column name if it exists; Otherwise returns None.
`workspace`	`Optional[str]`	Returns the workspace that this Tecton object belongs to. `None` for locally defined objects.

Methods

Name	Description
`__init__(...)`	Instantiate a new Feature Table.
`delete_keys(...)`	Deletes any materialized data that matches the specified join keys from the Feature Table.
`get_feature_columns()`	Retrieves the list of feature columns produced by this FeatureView.
`get_features_for_events(...)`	Returns a `TectonDataFrame` of historical values for this feature table.
`get_features_in_range(...)`	Returns a `TectonDataFrame` of historical values for this feature table.
`get_historical_features(...)`	[Deprecated in SDK 0.9] Returns a `TectonDataFrame` of historical values for this feature table.
`get_online_features(...)`	Returns a single Tecton FeatureVector from the Online Store.
`get_timestamp_field()`	Returns the name of the timestamp field of this Feature Table.
`ingest(...)`	Ingests a Dataframe into the Feature Table.
`summary()`	Displays a human-readable summary.
`validate()`	[Deprecated in SDK 1.0] Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.
`with_join_key_map(...)`	Rebind join keys for a Feature View or Feature Table used in a Feature Service.
`with_name(...)`	Rename a Feature View or Feature Table used in a Feature Service.

init(...)

Instantiate a new Feature Table.

Parameters

name (str) - Unique, human friendly name that identifies the Feature Table.

description (Optional[str]) - A human-readable description. Default: None

owner (Optional[str]) - Owner name (typically the email of the primary maintainer). Default: None

tags (Optional[Dict[str, str]]) - Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). Default: None

prevent_destroy (bool) - If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature Table that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreation of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Tables used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces. Default: false

entities (List[framework_entity.Entity]) - A list of Entity objects, used to organize features.

features (List[feature.Attribute]) - A list of features this Feature Table manages.

timestamp_field (str) - The column name that refers to the timestamp for records that are produced by the Feature Table.

ttl (Optional[datetime.timedelta]) - The TTL (or "look back window") for features defined by this feature table. This parameter determines how long features will live in the online store and how far to "look back" relative to a training example's timestamp when generating offline training sets. Shorter TTLs improve performance and reduce costs. Default: None

online (bool) - Enable writing to online feature store. Default: false

offline (bool) - Enable writing to offline feature store. Default: false

offline_store (Optional[Union[configs.OfflineStoreConfig, configs.DeltaConfig]]) - Configuration for how data is written to the offline feature store. Default: None

online_store (Optional[configs.OnlineStoreTypes]) - Configuration for how data is written to the online feature store. Default: None

batch_compute (Optional[configs.ComputeConfigTypes]) - Configuration for batch materialization clusters. Should be one of: [EMRClusterConfig, DatabricksClusterConfig, EMRJsonClusterConfig, DatabricksJsonClusterConfig, DataprocJsonClusterConfig] Default: None

online_serving_index (Optional[List[str]]) - (Advanced) Defines the set of join keys that will be indexed and queryable during online serving. Defaults to the complete set of join keys. Up to one join key may be omitted. If one key is omitted, online requests to a Feature Service will return all feature vectors that match the specified join keys. Default: None

alert_email (Optional[str]) - Email that alerts for this Feature Table will be sent to. Default: None

tecton_materialization_runtime (Optional[str]) - Version of tecton package used by your job cluster. Default: None

cache_config (Optional[configs.CacheConfig]) - Cache config for the Feature Table. Including this option enables the feature server to use the cache when retrieving features for this feature table. Will only be respected if the feature service containing this feature table has enable_online_caching set to True. Default: None

options (Optional[Dict[str, str]]) - Additional options to configure the Feature Table. Used for advanced use cases and beta features. Default: None

delete_keys(...)

Deletes any materialized data that matches the specified join keys from the Feature Table.

This method kicks off a job to delete the data in the offline and online stores. If a Feature Table has multiple entities, the full set of join keys must be specified. Only supports Dynamo online store. Maximum 500,000 keys can be deleted per request.

Parameters

keys (Union[pyspark_dataframe.DataFrame, pandas.DataFrame]) - The Dataframe to be deleted. Must conform to the Feature Table join keys.

online (bool) - Whether or not to delete from the online store. Default: true

offline (bool) - Whether or not to delete from the offline store. Default: true

Returns

List[str]: List of job ids for jobs created for entity deletion.

get_feature_columns(...)

Retrieves the list of feature columns produced by this FeatureView.

Returns

List[str]: The features produced by this FeatureView.

get_features_for_events(...)

info

This method is functionally equivalent to get_historical_features(spine) and has been renamed in Tecton 0.8 for clarity. get_historical_features() is planned to be deprecated in a future release.

Returns a TectonDataFrame of historical values for this feature table.

If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.

Note: The timestamp_key parameter is only applicable when a spine is passed in. Parameters start_time, end_time, and entities are only applicable when a spine is not passed in.

Examples: A FeatureView fv with join key user_id.

fv.get_features_for_events(spine) where spine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine.

fv.get_features_for_events(spine, timestamp_key='date_1') where spine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})` Fetch features from the offline store for users 1, 2, and 3 for the specified timestamps in the 'date_1' column in the spine.

Parameters

spine (Union[pyspark_dataframe.DataFrame, pandas.DataFrame, TectonDataFrame]) - The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled as feature_view_name.feature_name in the returned DataFrame. If spine is not specified, it'll return a DataFrame of feature values in the specified time range.

timestamp_key (Optional[str]) - Name of the time column in spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. Default: None

compute_mode (Optional[Union[ComputeMode, str]]) - Compute mode to use to produce the data frame. Default: None

Returns

TectonDataFrame: A TectonDataFrame with features values.

Examples

A FeatureTableft with join key user_id.

ft.get_features_for_events(events) where events=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the events dataframe.
ft.get_features_for_events(events, save_as='my_dataset) where events=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the events dataframe. Save the DataFrame as dataset with the name my_dataset.
ft.get_features_for_events(events, timestamp_key='date_1') where events=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the ‘date_1’ column in the events dataframe.

get_features_in_range(...)

Returns a TectonDataFrame of historical values for this feature table.

If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.

Parameters

start_time (datetime.datetime) - The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC.

end_time (datetime.datetime) - The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC.

max_lookback (Optional[datetime.timedelta]) - [Non-Aggregate Feature Tables Only] A performance optimization that configures how far back before start_time to look for events in the raw data. If set, get_features_in_range() may not include all entities with valid feature values in the specified time range, but get_features_in_range() will never return invalid values. Default: None

entities (Optional[Union[pyspark_dataframe.DataFrame, pandas.DataFrame, TectonDataFrame]]) - A DataFrame that is used to filter down feature values. If specified, this DataFrame should only contain join key columns. Default: None

compute_mode (Optional[Union[ComputeMode, str]]) - Compute mode to use to produce the data frame. Default: None

Returns

TectonDataFrame: A TectonDataFrame with features values.

get_historical_features(...)

Deprecation Warning

Deprecated in SDK 0.9. get_historical_features() is replaced by get_features_for_events() and get_features_in_range(). See Offline Retrieval Methods for details.

Parameters

spine (Optional[Union[pyspark_dataframe.DataFrame, pandas.DataFrame, TectonDataFrame]]) - The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled as feature_view_name.feature_name in the returned DataFrame. If spine is not specified, it'll return a DataFrame of feature values in the specified time range. Default: None

timestamp_key (Optional[str]) - Name of the time column in spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. Default: None

entities (Optional[Union[pyspark_dataframe.DataFrame, pandas.DataFrame, TectonDataFrame]]) - A DataFrame that is used to filter down feature values. If specified, this DataFrame should only contain join key columns. Default: None

start_time (Optional[datetime.datetime]) - The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC. Default: None

end_time (Optional[datetime.datetime]) - The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC. Default: None

compute_mode (Optional[Union[ComputeMode, str]]) - Compute mode to use to produce the data frame. Default: None

Returns

TectonDataFrame: A TectonDataFrame with features values.

Examples

A FeatureTable ft with join key user_id.

ft.get_historical_features(spine) where spine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine.
ft.get_historical_features(spine, save_as='my_dataset) where spine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. Save the DataFrame as dataset with the name my_dataset.
ft.get_historical_features(spine, timestamp_key='date_1') where spine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]}) Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the ‘date_1’ column in the spine.
ft.get_historical_features(start_time=datetime(...), end_time=datetime(...)) Fetch all historical features from the offline store in the time range specified by start_time and end_time.

get_online_features(...)

Returns a single Tecton FeatureVector from the Online Store.

Parameters

join_keys (Mapping[str, Union[int, numpy.int_, str, bytes]]) - Join keys of the enclosed Feature Table.

include_join_keys_in_response (bool) - Whether to include join keys as part of the response FeatureVector. Default: false

Returns

FeatureVector: A FeatureVector of the results.

Examples

A FeatureTable ft with join key user_id.

ft.get_online_features(join_keys={'user_id': 1}) Fetch the latest features from the online store for user 1.
ft.get_online_features(join_keys={'user_id': 1}, include_join_keys_in_response=True) Fetch the latest features from the online store for user 1 and include the join key information (user_id=1) in the returned FeatureVector.

get_timestamp_field()

Returns the name of the timestamp field of this Feature Table.

Returns

str

ingest(...)

Ingests a Dataframe into the Feature Table.

This method kicks off a materialization job to write the data into the offline and online store, depending on the Feature Table configuration.

Parameters

df (Union[pyspark_dataframe.DataFrame, pandas.DataFrame]) - The Dataframe to be ingested. Has to conform to the Feature Table schema.

summary()

Displays a human-readable summary.

with_join_key_map()

Rebind join keys for a Feature View or Feature Table used in a Feature Service.

The keys in join_key_map should be the join keys, and the values should be the feature service overrides.

Parameters

join_key_map (Dict[str, str]) - Dictionary remapping the join key names. Dictionary keys are join keys, values are the feature service override values.

Returns

FeatureReference

Example

from tecton import FeatureService

# The join key for this feature service will be "feature_service_user_id".
feature_service = FeatureService(
    name="feature_service",
    features=[
        my_feature_view.with_join_key_map({"user_id" : "feature_service_user_id"}),
    ],
)

# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",
# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",
# "sender_features", and "recipient_features".
transaction_fraud_service = FeatureService(
    name="transaction_fraud_service",
    features=[
        # Select a subset of features from a feature view.
        transaction_features[["amount"]],

        # Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
        # transaction sender and recipient, so include the feature view twice and bind it to two different feature
        # service join keys.
        user_features.with_name("sender_features").with_join_key_map({"user_id" : "sender_id"}),
        user_features.with_name("recipient_features").with_join_key_map({"user_id" : "recipient_id"}),
    ],
)

with_name()

Rename a Feature View or Feature Table used in a Feature Service.

Parameters

namespace (str) - The namespace used to prefix the features joined from this FeatureView. By default, namespace is set to the FeatureView name.

Returns

FeatureReference

Examples

from tecton import FeatureService

# The feature view in this feature service will be named "new_named_feature_view" in training data dataframe
# columns and other metadata.

feature_service = FeatureService(
    name="feature_service",
    features=[
        my_feature_view.with_name("new_named_feature_view")
    ],
)

# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",
# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",
# "sender_features", and "recipient_features".
transaction_fraud_service = FeatureService(
    name="transaction_fraud_service",
    features=[
        # Select a subset of features from a feature view.
        transaction_features[["amount"]],

        # Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
        # transaction sender and recipient, so include the feature view twice and bind it to two different feature
        # service join keys.
        user_features.with_name("sender_features").with_join_key_map({"user_id" : "sender_id"}),
        user_features.with_name("recipient_features").with_join_key_map({"user_id" : "recipient_id"}),
    ],
)

validate()

Deprecation Warning

Deprecated in SDK 1.0. As of Tecton version 1.0 objects are validated upon object creation, so validate() is unnecessary.

Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.

Returns

None

Summary​

Example

FeatureTable (Class)​

Attributes​

Methods​

__init__(...)​

Parameters

delete_keys(...)​

Parameters

Returns

get_feature_columns(...)​

Returns

get_features_for_events(...)​

Parameters

Returns

Examples​

get_features_in_range(...)​

Parameters

Returns

get_historical_features(...)​

Parameters

Returns

Examples​

get_online_features(...)​

Parameters

Returns

Examples​

get_timestamp_field()​

Returns

ingest(...)​

Parameters

summary()​

with_join_key_map()​

Parameters

Returns

Example

with_name()​

Parameters

Returns

Examples

validate()​

Returns

Was this page helpful?

Summary

FeatureTable (Class)

Attributes

Methods

init(...)

delete_keys(...)

get_feature_columns(...)

get_features_for_events(...)

Examples

get_features_in_range(...)

get_historical_features(...)

Examples

get_online_features(...)

Examples

get_timestamp_field()

ingest(...)

summary()

with_join_key_map()

with_name()

validate()