tecton.StreamFeatureView
Summaryβ
A Tecton Stream Feature View, used for transforming and materializing features from a StreamSource.
The StreamFeatureView should not be instantiated directly and the
tecton.stream_feature_view()
decorator is recommended instead.
Attributesβ
Name | Data Type | Description |
---|---|---|
aggregations | Optional[configs.Aggregation] | List of Aggregation configs used by this Feature View. |
batch_schedule | Optional[datetime.timedelta] | The batch schedule of this Feature View. |
batch_trigger | BatchTriggerType | The BatchTriggerType for this FeatureView. |
description | str | Returns the description of the Tecton object. |
entities | List[specs.EntitySpec] | The Entities for this Feature View. |
feature_start_time | Optional[datetime.datetime] | |
id | str | Returns the unique id of the Tecton object. |
info | A dataclass containing basic info about this Tecton Object. | |
is_batch_trigger_manual | bool | Whether this Feature Viewβs batch trigger is BatchTriggerType.Manual. |
join_keys | List[str] | The join key column names. |
max_data_delay | Deprecated. | |
max_source_data_delay | datetime.timedelta | Returns the maximum data delay of input sources for this feature view. |
name | str | Returns the name of the Tecton object. |
online_serving_index | List[str] | The set of join keys that will be indexed and queryable during online serving. |
owner | Optional[str] | Returns the owner of the Tecton object. |
sources | The Source inputs for this Feature View. | |
tags | Dict[str, str] | Returns the tags of the Tecton object. |
transformations | List[specs.TransformationSpec] | The Transformations used by this Feature View. |
url | str | Returns a link to the Tecton Web UI. |
wildcard_join_key | Optional[set] | Returns a wildcard join key column name if it exists; Otherwise returns None. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. |
Methodsβ
Name | Description |
---|---|
__init__(...) | Construct a StreamFeatureView. |
cancel_materialization_job(...) | Cancels the scheduled or running batch materialization job for this Feature View specified by the job identifier. |
delete_keys(...) | Deletes any materialized data that matches the specified join keys from the FeatureView. |
deletion_status(...) | Displays information for deletion jobs created with the delete_keys() method,which may include past jobs, scheduled jobs, and job failures. |
get_feature_columns() | The features produced by this FeatureView. |
get_historical_features(...) | Returns a TectonDataFrame of historical values for this feature view. |
get_materialization_job(...) | Retrieves data about the specified materialization job for this Feature View. |
get_online_features(...) | Returns a single Tecton tecton.FeatureVector from the Online Store. |
get_timestamp_field() | Returns the nane of the timestamp field for this Feature View. |
list_materialization_jobs() | Retrieves the list of all materialization jobs for this Feature View. |
materialization_status(...) | Displays materialization information for the FeatureView, which may include past jobs, scheduled jobs, and job failures. |
run(...) | Run the FeatureView. |
run_stream(...) | Starts a streaming job to keep writting the output records of this FeatureView to a temporary table. |
summary() | Displays a human readable summary of this data source. |
test_run(...) | Run the FeatureView using mock data sources. |
trigger_materialization_job(...) | Starts a batch materialization job for this Feature View. |
validate() | Validate this Tecton object and its dependencies (if any). |
wait_for_materialization_job(...) | Blocks until the specified job has been completed. |
with_join_key_map(...) | Rebind join keys for a Feature View used in a Feature Service. |
with_name(...) | Rename a Feature View used in a Feature Service. |
__init__(...)β
Construct a StreamFeatureView.
init should not be used directly, and instead
tecton.stream_feature_view()
decorator is recommended.
cancel_materialization_job(...)β
Cancels the scheduled or running batch materialization job for this Feature View specified by the job identifier. Once cancelled, a job will not be retried further.
Job run state will be set to MANUAL_CANCELLATION_REQUESTED
. Note that
cancellation is asynchronous, so it may take some time for the cancellation to
complete. If job run is already in MANUAL_CANCELLATION_REQUESTED
or in a
terminal state then itβll return the job.
Parametersβ
job_id
(str
) β ID string of the materialization job.
Returnsβ
MaterializationJobData
object for the cancelled job.
delete_keys(...)β
Deletes any materialized data that matches the specified join keys from the FeatureView.
This method kicks off a job to delete the data in the offline and online stores. If a FeatureView has multiple entities, the full set of join keys must be specified. Only supports Delta as the offline store.(offline_store=DeltaConfig()) Maximum 500,000 keys can be deleted per request.
Parametersβ
-
keys
(Union
[DataFrame
,DataFrame
]) β The Dataframe to be deleted. Must conform to the FeatureView join keys. -
online
(bool
) β (Optional, default=True
) Whether or not to delete from the online store. -
offline
(bool
) β (Optional, default=True
) Whether or not to delete from the offline store.
Returnsβ
None if deletion job was created successfully.
deletion_status(...)β
Displays information for deletion jobs created with the delete_keys() method,which may include past jobs, scheduled jobs, and job failures.
Parametersβ
-
verbose
β If set to true, method will display additional low level deletion information, useful for debugging. (Default:False
) -
limit
β Maximum number of jobs to return. (Default:1000
) -
sort_columns
β A comma-separated list of column names by which to sort the rows. (Default:None
) -
errors_only
: If set to true, method will only return jobs that failed with an error. (Default:False
)
get_feature_columns()β
The features produced by this FeatureView.
get_historical_features(...)β
Returns a
TectonDataFrame
of historical values for this feature view.
By default (i.e. from_source=None
), this method fetches feature values from
the Offline Store for Feature Views that have offline materialization enabled
and otherwise computes feature values on the fly from raw data.
If no arguments are passed in, all feature values for this feature view will be returned in a Tecton DataFrame.
The timestamp_key parameter is only applicable when a spine is passed in. Parameters start_time, end_time, and entities are only applicable when a spine is not passed in.
Parametersβ
-
spine
(Union[
pyspark.sql.DataFrame
,pandas.DataFrame
,TectonDataFrame
]
) β The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled as feature_view_name.feature_name in the returned DataFrame. If spine is not specified, itβll return a DataFrame of feature values in the specified time range. (Default:None
) -
timestamp_key
(str) β Name of the time column in the spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. If more than one time column is present in the spine, you must specify which column youβd like to use. (Default:None
) -
start_time
(datetime.datetime) β The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC. (Default:None
) -
end_time
(datetime.datetime) β The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC. (Default:None
) -
entities
(Union[
pyspark.sql.DataFrame
,pandas.DataFrame
,TectonDataFrame
]
) β Filter feature data returned to a set of entity IDs. If specified, this DataFrame should only contain join key columns. (Default:None
) -
from_source
(bool) β Whether feature values should be recomputed from the original data source. IfNone
, feature values will be fetched from the Offline Store for Feature Views that have offline materialization enabled and otherwise computes feature values on the fly from raw data. Usefrom_source=True
to force computing from raw data andfrom_source=False
to error if any Feature Views are not materialized. (Default:None
) -
save
(bool) β Whether to persist the DataFrame as a Dataset object. (Default:False
) -
save_as
(str) β Name to save the DataFrame as. If unspecified and save=True, a name will be generated. (Default:None
)
Returnsβ
Examplesβ
A FeatureView fv
with join key user_id
.
-
fv.get_historical_features(spine)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. -
fv.get_historical_features(spine, save_as='my_dataset)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. Save the DataFrame as dataset with the name :py:mod`my_dataset`. -
fv.get_historical_features(spine, timestamp_key='date_1')
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the βdate_1β column in the spine. -
fv.get_historical_features(start_time=datetime(...), end_time=datetime(...))
Fetch all historical features from the offline store in the time range specified by start_time and end_time.
get_materialization_job(...)β
Retrieves data about the specified materialization job for this Feature View.
This data includes information about job attempts.
Parametersβ
job_id
(str
) β ID string of the materialization job.
Returnsβ
MaterializationJobData
object for the job.
get_online_features(...)β
Returns a single Tecton
tecton.FeatureVector
from the Online Store.
Parametersβ
-
join_keys
(Mapping
[str
,Union
[int
,int64
,str
,bytes
]]) β The join keys to fetch from the online store. -
include_join_keys_in_response
(bool
) β Whether to include join keys as part of the response FeatureVector. (Default:False
)
Returnsβ
A
tecton.FeatureVector
of the results.
Examplesβ
A FeatureView fv
with join key user_id
.
-
fv.get_online_features(join_keys={'user_id': 1})
Fetch the latest features from the online store for user 1. -
fv.get_online_features(join_keys={'user_id': 1}, include_join_keys_in_respone=True)
Fetch the latest features from the online store for user 1 and include the join key information (user_id=1) in the returned FeatureVector.
get_timestamp_field()β
Returns the nane of the timestamp field for this Feature View.
list_materialization_jobs()β
Retrieves the list of all materialization jobs for this Feature View.