FeatureTable
tecton.FeatureTable
A Tecton Feature Table.
Feature Tables are used to batch push features into Tecton from external feature computation systems.
Attributesβ
Name | Data Type | Description |
---|---|---|
description | str | Returns the description of the Tecton object. |
entities | List[specs.EntitySpec] The Entities for this Feature View. | |
id | str | Returns the unique id of the Tecton object. |
info | A dataclass containing basic info about this Tecton object. | |
join_keys | List[str] | The join key column names. |
name | str | Returns the name of the Tecton object. |
online_serving_index | List[str] | The set of join keys that will be indexed and queryable during online serving. |
owner | Optional[str] | Returns the owner of the Tecton object. |
tags | Dict[str, str] | Returns the tags of the Tecton object. |
url | str | Returns a link to the Tecton Web UI. |
wildcard_join_key | Optional[set] | Returns a wildcard join key column name if it exists; Otherwise returns None. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. |
Methodsβ
Name | Description |
---|---|
__init__(...) | Instantiate a new FeatureTable. |
cancel_materialization_job(...) | Cancels the scheduled or running batch materialization job for this Feature View specified by the job identifier. |
delete_keys(...) | Deletes any materialized data that matches the specified join keys from the FeatureTable. |
deletion_status(...) | Displays information for deletion jobs created with the delete_keys() method,which may include past jobs, scheduled jobs, and job failures. |
get_feature_columns() | The features produced by this FeatureView. |
get_features_for_events(...) | Returns a TectonDataFrame of historical values for this feature view. |
get_features_in_range(...) | Returns a TectonDataFrame with historical feature values for this Feature View within the input time range. |
get_historical_features(...) | Returns a TectonDataFrame of historical values for this feature table. |
get_materialization_job(...) | Retrieves data about the specified materialization job for this Feature View. |
get_online_features(...) | Returns a single Tecton FeatureVector from the Online Store. |
get_timestamp_field() | Returns the name of the timestamp field of this Feature Table. |
ingest() | Ingests a Dataframe into the FeatureTable. |
list_materialization_jobs() | Retrieves the list of all materialization jobs for this Feature View. |
materialization_status(...) | Displays materialization information for the FeatureView, which may include past jobs, scheduled jobs, and job failures. |
summary() | Displays a human readable summary. |
validate() | Validate this Tecton object and its dependencies (if any). |
with_join_key_map(...) | Rebind join keys for a Feature View used in a Feature Service. |
with_name(...) | Rename a Feature View used in a Feature Service. |
__init__(...)β
Instantiate a new FeatureTable.
Parametersβ
-
name
(str
) β Unique, human friendly name that identifies the FeatureTable. -
description
(Optional
[str
]) β A human readable description. (Default:None
) -
owner
(Optional
[str
]) β Owner name (typically the email of the primary maintainer). (Default:None
) -
tags
(Optional
[Dict
[str
,str
) β Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). (Default:None
) -
prevent_destroy
(bool
) β If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object,prevent_destroy
must be first set to False via the same tecton apply or a separate tecton apply.prevent_destroy
can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs.prevent_destroy
also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. ifprevent_destroy
is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service.prevent_destroy
is only enforced in live (i.e. non-dev) workspaces. (Default:False
) -
entities
(List
[Entity
]) β A list of Entity objects, used to organize features. -
schema
(List
[Field
]) β A schema for the FeatureTable. Supported types are: Int64, Float64, String, Bool and Array with Int64, Float32, Float64 and String typed elements. Additionally you must have exactly one Timestamp typed column for the feature timestamp. -
ttl
(timedelta
) β The TTL (or βlook back windowβ) for features defined by this feature table. This parameter determines how long features will live in the online store and how far to βlook backβ relative to a training exampleβs timestamp when generating offline training sets. Shorter TTLs improve performance and reduce costs. -
online
(bool
) β Enable writing to online feature store. (Default:False
) -
offline
(bool
) β Enable writing to offline feature store. (Default:False
) -
offline_store
(Union[OfflineStoreConfig, DeltaConfig,
None
]
) β Configuration for Tecton's Offline Store. Note that Feature Tables only support the Delta format and do not supportpublish_full_features=True
. Default:OfflineStoreConfig(
staging_table_format=DeltaConfig(datetime.timedelta(days=1), subdirectory_override=None),
) -
online_store
(Union[DynamoConfig, RedisConfig,
None
]
) β Configuration for how data is written to the online feature store. (Default:None
) -
batch_compute
([Union[DatabricksClusterConfig, EMRClusterConfig, DatabricksJsonClusterConfig, EMRJsonClusterConfig
,None
]
) β Configuration for batch materialization clusters. Should be one of: [EMRClusterConfig
,DatabricksClusterConfig
,EMRJsonClusterConfig
,DatabricksJsonClusterConfig
] (Default:None
) -
online_serving_index
(Optional
[List
[str
) β (Advanced) Defines the set of join keys that will be indexed and queryable during online serving. Defaults to the complete set of join keys. Up to one join key may be omitted. If one key is omitted, online requests to a Feature Service will return all feature vectors that match the specified join keys. (Default:None
) -
alert_email
(Optional
[str
]) β Email that alerts for this FeatureTable will be sent to. (Default: None) -
tecton_materialization_runtime
(Optional
[str
]) - Version of the Tecton Materialization Runtime used for materialization jobs. Required on 0.8+ when materialization is enabled (withonline=True
oroffline=True
). Recommended to set to the SDK version the Feature Table was applied with (e.g. "0.8.0"). (Default:None
)
Exampleβ
from tecton import Entity, FeatureTable
from tecton.types import Field, String, Timestamp, Int64
import datetime
# Declare your user Entity instance here or import it if defined elsewhere in
# your Tecton repo.
user = ...
schema = [
Field("user_id", String),
Field("timestamp", Timestamp),
Field("user_login_count_7d", Int64),
Field("user_login_count_30d", Int64),
]
user_login_counts = FeatureTable(
name="user_login_counts",
entities=[user],
schema=schema,
online=True,
offline=True,
ttl=datetime.timedelta(days=30),
alert_email="xxx@yyy.com",
)
cancel_materialization_job(...)β
Cancels the scheduled or running batch materialization job for this Feature View specified by the job identifier. Once cancelled, a job will not be retried further.
Job run state will be set to MANUAL_CANCELLATION_REQUESTED
. Note that
cancellation is asynchronous, so it may take some time for the cancellation to
complete. If job run is already in MANUAL_CANCELLATION_REQUESTED
or in a
terminal state then itβll return the job.
Parametersβ
job_id
(str
) β ID string of the materialization job.
Returnsβ
MaterializationJobData
object for the cancelled job.
delete_keys(...)β
Deletes any materialized data that matches the specified join keys from the FeatureTable.
This method kicks off a job to delete the data in the offline and online stores. If a FeatureTable has multiple entities, the full set of join keys must be specified. Maximum 500,000 keys can be deleted per request.
Parametersβ
-
keys
(Union[
DataFrame
,DataFrame
]
) β The Dataframe to be deleted. Must conform to the FeatureTable join keys. -
online
(bool
) β (Optional, default=True) Whether or not to delete from the online store. (Default:True
) -
offline
(bool
) β (Optional, default=True) Whether or not to delete from the offline store. (Default:True
)
Returnsβ
List of ID strings of the created Entity Deletion jobs.
deletion_status(...)β
deletion_status
is deprecated starting in Tecton 0.8. Instead, the call to
delete_keys
will return a list of job ids that can be passed into
get_materialization_job
to see the status of your entity deletion jobs.
Displays information for deletion jobs created with the delete_keys() method,which may include past jobs, scheduled jobs, and job failures.
Parametersβ
-
verbose
β If set to true, method will display additional low level deletion information, useful for debugging. (Default:False
) -
limit
β Maximum number of jobs to return. (Default:1000
) -
sort_columns
β A comma-separated list of column names by which to sort the rows. (Default:None
) -
errors_only
: If set to true, method will only return jobs that failed with an error. (Default:False
)
get_feature_columns()β
The features produced by this FeatureView.
get_features_for_events(...)β
Returns a TectonDataFrame
of historical
values for this feature table.
By default (i.e. from_source=None
), this method fetches feature values from
the Offline Store for Feature Views that have offline materialization enabled
and otherwise computes feature values on the fly from raw data.
If no arguments are passed in, all feature values for this feature view will be returned in a Tecton DataFrame.
This method is functionally equivalent to get_historical_features(spine)
and
has been renamed in Tecton 0.8 for clarity. get_historical_features()
is
planned to be deprecated in a future release.
Parametersβ
-
events
(Union[
pyspark.sql.DataFrame
,pandas.DataFrame
,TectonDataFrame
]
) β A dataframe of all possible join keys and timestamps that specify which feature values to fetch. To distinguish between columns in theevents
dataframe and feature columns, feature columns are labeled as feature_view_name.feature_name in the returned DataFrame. -
timestamp_key
(str) β Name of the time column in theevents
dataframe. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of theevents
dataframe if there is only one present. If more than one time column is present in theevents
dataframe, you must specify which column youβd like to use. (Default:None
) -
from_source
(bool) β Whether feature values should be recomputed from the original data source. IfNone
, feature values will be fetched from the Offline Store for Feature Views that have offline materialization enabled and otherwise computes feature values on the fly from raw data. Usefrom_source=True
to force computing from raw data andfrom_source=False
to error if any Feature Views are not materialized. (Default:None
) -
save
(bool) β Whether to persist the DataFrame as a Dataset object. (Default:False
) -
save_as
(str) β Name to save the DataFrame as. If unspecified and save=True, a name will be generated. (Default:None
) -
mock_inputs
(Optional[Dict[str, Union[pandas.DataFrame, pyspark_dataframe.DataFrame]]]
) β Dictionary for mock inputs that should be used instead of fetching directly from raw data sources. The keys should match the feature viewβs function parameters. For feature views with multiple sources, mocking some data sources and using raw data for others is supported. Usingmock_inputs
is incompatible withfrom_source=False
andsave/save_as
. -
compute_mode
(Union
[str
,tecton.ComputeMode
,None
]) β Compute mode to use to produce the data frame. Valid string values are"spark"
,"snowflake"
,"athena"
, and"rift"
.
Returnsβ
Examplesβ
A FeatureTableft
with join key user_id
.
-
ft.get_features_for_events(events)
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in theevents
dataframe. -
ft.get_features_for_events(events, save_as='my_dataset)
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in theevents
dataframe. Save the DataFrame as dataset with the namemy_dataset
. -
ft.get_features_for_events(events, timestamp_key='date_1')
whereevents=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the βdate_1β column in theevents
dataframe.
get_features_in_range(...)β
Returns a TectonDataFrame
of historical
values for this Feature View which were valid within the input time range. A
feature value is considered to be valid at a specific point in time if the
Online Store would have returned that value if queried at that moment in time.
The DataFrame
returned by this method contains the following:
-
Entity Join Key Columns
-
Feature Value Columns and
-
The columns
_valid_from
and_valid_to
that specify the time range for which the row of features values is valid. The time range defined by[_valid_from, _valid_to)
will never intersect with any other rows for the same join keys.-
_valid_from
(Inclusive)The timestamp from which feature values were valid and returned from the Online Feature Store for the corresponding set of join keys.
_valid_from
will never be less thanend_time
. Values for which_valid_from
is equal tostart_time
may have been valid prior tostart_time
. -
_valid_to
(Exclusive) The timestamp from which feature values are invalid and no longer returned from the Online Feature Store for the corresponding set of join keys._valid_to
will never be greater thanend_time
. Values for which_valid_to
is equal toend_time
may be valid beyondend_time
.
-
Parametersβ
-
start_time
(datetime.datetime
) β The inclusive start time of the time range to compute features for. -
end_time
(datetime.datetime
) β The exclusive end time of the time range to compute features for. -
max_lookback
(datetime.timedelta
) β [Non-Aggregate Feature Views Only] A performance optimization that configures how far back beforestart_time
to look for events in the raw data. If set,get_features_in_range()
may not include all entities with valid feature values in the specified time range, butget_features_in_range()
will never return invalid values. -
entities
(Union[
pyspark.sql.DataFrame
,pandas.DataFrame
,TectonDataFrame
]
) β Filter feature data returned to a set of entity IDs. If specified, thisDataFrame
should only contain join key columns. (Default:None
) -
mock_inputs
(Optional[Dict[str, Union[pandas.DataFrame, pyspark_dataframe.DataFrame]]]
) β Dictionary for mock inputs that should be used instead of fetching directly from raw data sources. The keys should match the Feature Viewβs function parameters. For Feature Views with multiple sources, mocking some data sources and using raw data for others is supported. Usingmock_inputs
is incompatible withfrom_source=False
. -
compute_mode
(Union
[str
,tecton.ComputeMode
,None
]) β Compute mode to use to produce theDataFrame
. Valid string values are"spark"
,"snowflake"
,"athena"
, and"rift"
.
Returnsβ
A TectonDataFrame
with feature values
for the requested time range in the format specified above.
get_historical_features(...)β
Returns a TectonDataFrame
of historical
values for this feature table.
If no arguments are passed in, all feature values for this feature table will be returned in a TectonDataFrame.
The timestamp_key parameter is only applicable when a spine is passed in. Parameters start_time, end_time, and entities are only applicable when a spine is not passed in.
Parametersβ
-
spine
(Union[
pyspark.sql.DataFrame
,pandas.DataFrame
,TectonDataFrame
]
) β The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled as feature_view_name.feature_name in the returned DataFrame. If spine is not specified, itβll return a DataFrame of feature values in the specified time range. (Default:None
) -
timestamp_key
(str
) β Name of the time column in spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. (Default:None
) -
entities
(Union[
pyspark.sql.DataFrame
,pandas.DataFrame
,TectonDataFrame
]
) β A DataFrame that is used to filter down feature values. If specified, this DataFrame should only contain join key columns. (Default:None
) -
start_time
(Union[pendulum.DateTime
,datetime.datetime
]
) β The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC. (Default:None
) -
end_time
(Union[pendulum.DateTime
,datetime.datetime
]
) β The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC. (Default:None
) -
save
(bool
) β Whether to persist the DataFrame as a Dataset object. (Default:False
). -
save_as
(str
) β name to save the DataFrame as. If unspecified and save=True, a name will be generated. (Default:None
) -
compute_mode
(Union
[str
,tecton.ComputeMode
,None
]) β Compute mode to use to produce the data frame. Valid string values are"spark"
,athena
, and"rift"
.
Returnsβ
A TectonDataFrame
with features values.
Examplesβ
A FeatureTable ft
with join key user_id
.
-
ft.get_historical_features(spine)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. -
ft.get_historical_features(spine, save_as='my_dataset)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. Save the DataFrame as dataset with the namemy_dataset
. -
ft.get_historical_features(spine, timestamp_key='date_1')
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the βdate_1β column in the spine. -
ft.get_historical_features(start_time=datetime(...), end_time=datetime(...))
Fetch all historical features from the offline store in the time range specified by start_time and end_time.
get_materialization_job(...)β
Retrieves data about the specified materialization job for this Feature View.
This data includes information about job attempts.
Parametersβ
job_id
(str
) β ID string of the materialization job.
Returnsβ
MaterializationJobData
object for the job.
get_online_features(...)β
Returns a single Tecton FeatureVector from the Online Store.
Parametersβ
-
join_keys
(Mapping
[str
,Union
[int
,int64
,str
,bytes
]])
β Join keys of the enclosed FeatureTable. -
include_join_keys_in_response
(bool
) β Whether to include join keys as part of the response FeatureVector. (Default:False
)
Returnsβ
A FeatureVector of the results.
Examplesβ
A FeatureTable ft
with join key user_id
.
-
ft.get_online_features(join_keys={'user_id': 1})
Fetch the latest features from the online store for user 1. -
ft.get_online_features(join_keys={'user_id': 1}, include_join_keys_in_response=True)
Fetch the latest features from the online store for user 1 and include the join key information (user_id=1) in the returned FeatureVector.
get_timestamp_field()β
Returns the name of the timestamp field of this Feature Table.
ingest()β
Ingests a Dataframe into the FeatureTable.
This method kicks off a materialization job to write the data into the offline and online store, depending on the Feature Table configuration.
Parametersβ
df
(Union[
DataFrame
,DataFrame
]
) β The Dataframe to be ingested. Has to conform to the FeatureTable schema.
list_materialization_jobs()β
Retrieves the list of all materialization jobs for this Feature View.
Returnsβ
List of
MaterializationJobData
objects.
materialization_status(...)β
Displays materialization information for the FeatureView, which may include past jobs, scheduled jobs, and job failures.
This method returns different information depending on the type of FeatureView.
Parametersβ
verbose
β If set to true, method will display additional low level materialization information, useful for debugging. (Default:False
)limit
β Maximum number of jobs to return. (Default:1000
)sort_columns
β A comma-separated list of column names by which to sort the rows. (Default:None
)errors_only
β If set to true, method will only return jobs that failed with an error. (Default:False
)
summary()β
Displays a human readable summary of this data source.
validate()β
Validate this Tecton object and its dependencies (if any).
Validation performs most of the same checks and operations as tecton plan
.
-
Check for invalid object configurations, e.g. setting conflicting fields.
-
For Data Sources and Feature Views, test query code and derive schemas. e.g. test that a Data Sourceβs specified s3 path exists or that a Feature Viewβs SQL code executes and produces supported feature data types.
Objects already applied to Tecton do not need to be re-validated on retrieval
(e.g. fv = tecton.get_workspace('prod').get_feature_view('my_fv')
) since they
have already been validated during tecton plan
. Locally defined objects (e.g.
my_ds = BatchSource(name="my_ds", ...)
) may need to be validated before some
of their methods can be called, e.g.
my_feature_view.get_features_for_events()
.
with_join_key_map(...)β
Rebind join keys for a Feature View used in a Feature Service.
The keys in join_key_map
should be the feature view join keys, and the values
should be the feature service overrides.
Parametersβ
join_key_map
Exampleβ
from tecton import FeatureService
# The join key for this feature service will be "feature_service_user_id".
feature_service = FeatureService(
name="feature_service",
features=[
my_feature_view.with_join_key_map({"user_id": "feature_service_user_id"}),
],
)
# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",
# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",
# "sender_features", and "recipient_features".
transaction_fraud_service = FeatureService(
name="transaction_fraud_service",
features=[
# Select a subset of features from a feature view.
transaction_features[["amount"]],
# Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
# transaction sender and recipient, so include the feature view twice and bind it to two different feature
# service join keys.
user_features.with_name("sender_features").with_join_key_map({"user_id": "sender_id"}),
user_features.with_name("recipient_features").with_join_key_map({"user_id": "recipient_id"}),
],
)
with_name(...)β
Rename a Feature View used in a Feature Service.
Parametersβ
namespace
Exampleβ
from tecton import FeatureService
# The feature view in this feature service will be named "new_named_feature_view" in training data dataframe
# columns and other metadata.
feature_service = FeatureService(
name="feature_service",
features=[my_feature_view.with_name("new_named_feature_view")],
)
# Here is a more sophisticated example. The join keys for this feature service will be "transaction_id",
# "sender_id", and "recipient_id" and will contain three feature views named "transaction_features",
# "sender_features", and "recipient_features".
transaction_fraud_service = FeatureService(
name="transaction_fraud_service",
features=[
# Select a subset of features from a feature view.
transaction_features[["amount"]],
# Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
# transaction sender and recipient, so include the feature view twice and bind it to two different feature
# service join keys.
user_features.with_name("sender_features").with_join_key_map({"user_id": "sender_id"}),
user_features.with_name("recipient_features").with_join_key_map({"user_id": "recipient_id"}),
],
)