tecton.interactive.BatchFeatureView¶
-
class
tecton.interactive.
BatchFeatureView
(proto, fco_container)¶ BatchFeatureView class.
To get a FeatureView instance, call
tecton.get_feature_view()
.Methods
Deletes any materialized data that matches the specified join keys from the FeatureView.
Displays information for deletion jobs created with the delete_keys() method, which may include past jobs, scheduled jobs, and job failures.
Deprecated.
Deprecated.
Deprecated.
Returns a Tecton
DataFrame
of historical values for this feature view.Returns a single Tecton
FeatureVector
from the Online Store.Displays materialization information for the FeatureView, which may include past jobs, scheduled jobs, and job failures.
Deprecated.
Run the FeatureView on the fly.
Returns various information about this feature definition, including the most critical metadata such as the name, owner, features, etc.
-
delete_keys
(keys, online=True, offline=True)¶ Deletes any materialized data that matches the specified join keys from the FeatureView. This method kicks off a job to delete the data in the offline and online stores. If a FeatureView has multiple entities, the full set of join keys must be specified. Only supports Delta offline store and Dynamo online store. (offline_config=DeltaConfig() and online_config left as default) Maximum 10000 keys can be deleted per request.
- Parameters
- Returns
None if deletion job was created successfully.
-
deletion_status
(verbose=False, limit=1000, sort_columns=None, errors_only=False)¶ Displays information for deletion jobs created with the delete_keys() method, which may include past jobs, scheduled jobs, and job failures.
- Parameters
verbose – If set to true, method will display additional low level deletion information, useful for debugging.
limit – Maximum number of jobs to return.
sort_columns – A comma-separated list of column names by which to sort the rows.
- Param
errors_only: If set to true, method will only return jobs that failed with an error.
-
get_feature_dataframe
(spine=None, spine_time_key=None, use_materialized_data=True, save=None, save_as=None)¶ Deprecated. Returns a Tecton
DataFrame
that contains the output Feature Transformation of the Feature View.- Parameters
spine (
Union
[DataFrame
,DataFrame
,None
]) – (Optional) The spine to join against, as a dataframe. If present, the returned data frame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. If spine is not specified, it’ll return a dataframe with sample feature vectors.spine_time_key (
Optional
[str
]) – (Optional) Name of the time column in spine. If unspecified, will default to the time column of the spine if there is only one present.use_materialized_data (
bool
) – (Optional) Use materialized data if materialization is enabledsave (
Optional
[bool
]) – (Optional) set to True to persist DataFrame as a Dataset objectsave_as (
Optional
[str
]) – (Optional) name to save the DataFrame as. Not applicable when save=False. If unspecified and save=True, a name will be generated.
- Returns
A Tecton
DataFrame
.
-
get_feature_vector
(join_keys=None, include_join_keys_in_response=False, request_context_map=None)¶ Deprecated. Returns a single Tecton
FeatureVector
from the Online Store. At least one of join_keys or request_context_map is required.- Parameters
join_keys (
Optional
[Mapping
[str
,Union
[int
,int64
,str
,bytes
]]]) – Join keys of the enclosed FeatureViews.include_join_keys_in_response (
bool
) – Whether to include join keys as part of the response FeatureVector.request_context_map (
Optional
[Mapping
[str
,Union
[int
,int64
,str
,bytes
,float
]]]) – Dictionary of request context values.
- Returns
A
FeatureVector
of the results.
-
get_features
(entities=None, start_time=None, end_time=None, from_source=False)¶ Deprecated. Returns all the feature values that are defined by this Feature View in the specified time range.
- Parameters
entities (
Union
[DataFrame
,DataFrame
,None
]) – (Optional) Filter feature data to a set of entity IDs. If specified, this DataFrame should only contain join key columns.start_time (
Union
[DateTime
,datetime
,None
]) – (Optional) The interval start time from when we want to retrieve features.end_time (
Union
[DateTime
,datetime
,None
]) – (Optional) The interval end time until when we want to retrieve features.from_source (
bool
) – Whether feature values should be recomputed from the original data source. If False, we will attempt to read the values from the materialized store.
- Returns
A Tecton DataFrame with features values.
-
get_historical_features
(spine=None, timestamp_key=None, start_time=None, end_time=None, entities=None, from_source=False, save=False, save_as=None)¶ Returns a Tecton
DataFrame
of historical values for this feature view. If no arguments are passed in, all feature values for this feature view will be returned in a Tecton DataFrame.Note: The timestamp_key parameter is only applicable when a spine is passed in. Parameters start_time, end_time, and entities are only applicable when a spine is not passed in.
- Parameters
spine (Union[pyspark.sql.DataFrame, pandas.DataFrame, tecton.DataFrame]) – (Optional) The spine to join against, as a dataframe. If present, the returned DataFrame will contain rollups for all (join key, temporal key) combinations that are required to compute a full frame from the spine. To distinguish between spine columns and feature columns, feature columns are labeled as feature_view_name.feature_name in the returned DataFrame. If spine is not specified, it’ll return a DataFrame of feature values in the specified time range.
timestamp_key (str) – (Optional) Name of the time column in the spine. This method will fetch the latest features computed before the specified timestamps in this column. If unspecified, will default to the time column of the spine if there is only one present. If more than one time column is present in the spine, you must specify which column you’d like to use.
start_time (Union[pendulum.DateTime, datetime.datetime]) – (Optional) The interval start time from when we want to retrieve features. If no timezone is specified, will default to using UTC.
end_time (Union[pendulum.DateTime, datetime.datetime]) – (Optional) The interval end time until when we want to retrieve features. If no timezone is specified, will default to using UTC.
entities (Union[pyspark.sql.DataFrame, pandas.DataFrame, tecton.DataFrame]) – (Optional) Filter feature data returned to a set of entity IDs. If specified, this DataFrame should only contain join key columns.
from_source (bool) – (Optional) Whether feature values should be recomputed from the original data source. If False, we will read the materialized values from the offline store.
save (bool) – (Optional) Whether to persist the DataFrame as a Dataset object. Default is False.
save_as (str) – (Optional) Name to save the DataFrame as. If unspecified and save=True, a name will be generated.
Examples
A FeatureView
fv
with join keyuser_id
.1)
fv.get_historical_features(spine)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine.2)
fv.get_historical_features(spine, save_as='my_dataset)
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the spine. Save the DataFrame as dataset with the name :py:mod`my_dataset`.3)
fv.get_historical_features(spine, timestamp_key='date_1')
wherespine=pandas.Dataframe({'user_id': [1,2,3], 'date_1': [datetime(...), datetime(...), datetime(...)], 'date_2': [datetime(...), datetime(...), datetime(...)]})
Fetch historical features from the offline store for users 1, 2, and 3 for the specified timestamps in the ‘date_1’ column in the spine.4)
fv.get_historical_features(start_time=datetime(...), end_time=datetime(...))
Fetch all historical features from the offline store in the time range specified by start_time and end_time.- Returns
A Tecton
DataFrame
.
-
get_online_features
(join_keys, include_join_keys_in_response=False)¶ Returns a single Tecton
FeatureVector
from the Online Store.- Parameters
Examples
A FeatureView
fv
with join keyuser_id
.1)
fv.get_online_features(join_keys={'user_id': 1})
Fetch the latest features from the online store for user 1.2)
fv.get_online_features(join_keys={'user_id': 1}, include_join_keys_in_respone=True)
Fetch the latest features from the online store for user 1 and include the join key information (user_id=1) in the returned FeatureVector.- Returns
A
FeatureVector
of the results.
-
materialization_status
(verbose=False, limit=1000, sort_columns=None, errors_only=False)¶ Displays materialization information for the FeatureView, which may include past jobs, scheduled jobs, and job failures. This method returns different information depending on the type of FeatureView.
- Parameters
verbose – If set to true, method will display additional low level materialization information, useful for debugging.
limit – Maximum number of jobs to return.
sort_columns – A comma-separated list of column names by which to sort the rows.
- Param
errors_only: If set to true, method will only return jobs that failed with an error.
-
preview
(limit=10, time_range=None, use_materialized_data=True)¶ Deprecated. Shows a preview of the FeatureView’s features. Random, unique join_keys are chosen to showcase the features.
- Parameters
- Returns
A Tecton
DataFrame
.
-
run
(feature_start_time=None, feature_end_time=None, **mock_inputs)¶ Run the FeatureView on the fly. It supports mock input data, but if mock_inputs is not provided for some features, those inputs will be retrieved from the linked DataSources. In that case, the run may takes several minutes to retrieve the data.
- Parameters
feature_start_time (
Union
[DateTime
,datetime
,None
]) – Start time for the feature. mock_inputs and linked DataSources will be filtered in respect to providing neccessity inputs for this feature time. The output values with timestamps earlier than this will be dropped. If unset, default to feature_end_time minus materialization schedule interval.feature_end_time (
Union
[DateTime
,datetime
,None
]) – End time for input data (both data sources, and mock_inputs). mock_inputs and linked DataSources will be filtered in respect to providing neccessity inputs for this feature time. The output values with timestamps later than this will be dropped. If unset, default to datetime now, at the time of a run.**mock_inputs – If provided, mock_inputs will be used as the FeatureView inputs for the Run, instead of the data from linked DataSources. The name of the parameter(s) must be a valid FeatureView input names.
Examples
A FeatureView ‘fv’ with inputs: ‘ds1’, ‘ds2’.
fv.run()
Use inputs from the linked DataSources for both ds1 and ds2. feature_end_time defaults ‘now’, and feature_start_time is set to feature_end_time - batch_schedule.fv.run(feature_start_time=datetime(2021, 6, 21), feature_end_time=datetime(2021, 6, 22))
feature_start_time and feature_end_time set by users.fv.run(ds1=mock_dataframe1)
Use mock_dataframe1 as input ds1, ds2 from a linked DataSource.fv.run(ds1=mock_dataframe1, ds2=mock_dataframe2)
Use the mock dataframes for both ds1 and ds2.
- Returns
A tecton DataFrame of the results.
-
summary
()¶ Returns various information about this feature definition, including the most critical metadata such as the name, owner, features, etc.
Attributes
batch_materialization_schedule
This represents how often we schedule batch materialization jobs.
created_at
Returns the creation date of this Tecton Object.
defined_in
Returns filename where this Tecton Object has been declared.
description
The description of this Tecton Object, set by user.
entity_names
Returns a list of entity names.
family
The family of this Tecton Object, used to group Objects.
feature_start_time
This represents the time at which features are first available.
features
Returns the names of the (output) features.
id
Returns the id of this object
is_on_demand
Deprecated.
is_temporal
Deprecated.
is_temporal_aggregate
Deprecated.
join_keys
Returns the join key column names
name
The name of this Tecton Object.
online_serving_index
Returns Defines the set of join keys that will be indexed and queryable during online serving.
owner
The owner of this Tecton Object (typically the email of the primary maintainer.)
schedule_offset
If this attribute is non-empty, Tecton will schedule materialization jobs at an offset equal to this.
tags
Tags associated with this Tecton Object (key-value pairs of arbitrary metadata set by user.)
timestamp_key
Returns the timestamp_key column name of this FeatureView.
type
‘Temporal’.
url
Returns a link to the Tecton Web UI.
wildcard_join_key
Returns a wildcard join key column name if it exists; Otherwise returns None.
workspace
Returns the workspace this Tecton Object was created in.
-