BatchFeatureView
Summary​
A Tecton Batch Feature View, used for materializing features on a batch schedule from a BatchSource.Â
The BatchFeatureView should not be instantiated directly, the
@batch_feature_view
decorator is recommended instead.
Example​
from tecton import batch_feature_view, Attribute
from tecton.types import Bool
from fraud.entities import user
from fraud.data_sources.fraud_users import fraud_users_batch
from fraud.data_sources.transactions import transactions_batch
from datetime import datetime, timedelta
# For every transaction, the following Feature View precomputes a feature that indicates
# whether a user was an adult as of the time of the transaction
@batch_feature_view(
sources=[transactions_batch, fraud_users_batch],
entities=[user],
mode="spark_sql",
online=False,
offline=False,
feature_start_time=datetime(2022, 5, 1),
timestamp_field="timestamp",
features=[Attribute("user_is_adult", Int64)],
batch_schedule=timedelta(days=1),
ttl=timedelta(days=100),
description="Whether the user performing the transaction is over 18 years old.",
)
def transaction_user_is_adult(transactions_batch, fraud_users_batch):
return f"""
select
timestamp,
t.user_id,
IF (datediff(timestamp, to_date(dob)) > (18*365), 1, 0) as user_is_adult
from {transactions_batch} t join {fraud_users_batch} u on t.user_id=u.user_id
"""
@batch_feature_view (Decorator)​
Declare a Batch Feature View.Parameters
mode
(str
) - (Required) Either the compute mode for the Transformation function or elsepipeline
modesources
(Sequence
[Union
[framework_data_source.BatchSource
,FilteredSource
]]) - (Required) The Data Source inputs to the Feature View.entities
(Sequence
[framework_entity.Entity
]) - (Required) The entities this Feature View is associated with.timestamp_field
(str
) - (Required) The column name that refers to the timestamp for records that are produced by the feature view. This parameter is optional if exactly one column is a Timestamp type.features
(Union
[Sequence
[feature.Aggregate
],Sequence
[Union
[feature.Attribute
,feature.Embedding
,feature.Inference
]]]) - (Required) A list of Attribute, Aggregate, and Embedding feature values managed by this Feature View.name
(Optional
[str
]) - Unique, human friendly name that identifies the FeatureView. Defaults to the function name. Default:None
description
(Optional
[str
]) - A human readable description. Default:None
owner
(Optional
[str
]) - Typically the name or email of the Feature View's primary maintainer. Default:None
tags
(Optional
[Dict
[str
,str
]]) - Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). Default:None
prevent_destroy
(bool
) - If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object,prevent_destroy
must be set to False via the same tecton apply or a separate tecton apply.prevent_destroy
can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs.prevent_destroy
also blocks changes to dependent Tecton objects that would trigger a recreation of the tagged object, e.g. ifprevent_destroy
is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service.prevent_destroy
is only enforced in live (i.e. non-dev) workspaces. Default:false
aggregation_interval
(Optional
[datetime.timedelta
]) - The size of the tiles for a particular feature view (for example,"1h"
or"6h"
). Only valid when using aggregations. Default:None
aggregation_secondary_key
(Optional
[str
]) - Configures secondary key aggregates using the set column. Only valid when using aggregations. Default:None
online
(bool
) - Whether the feature view should be materialized to the online feature store. Default:false
offline
(bool
) - Whether the feature view should be materialized to the offline feature store. Default:false
ttl
(Optional
[datetime.timedelta
]) - The TTL (or "look back window") for features defined by this feature view. This parameter determines how long features will live in the online store and how far to "look back" relative to a training example's timestamp when generating offline training sets. Shorter TTLs improve performance and reduce costs. Default:None
feature_start_time
(Optional
[datetime.datetime
]) - When materialization for this feature view should start from. (Required ifoffline=true
oronline=true
) Default:None
lifetime_start_time
(Optional
[datetime.datetime
]) - The start time for what data should be included in a lifetime aggregate. (Required if using lifetime windows) Default:None
manual_trigger_backfill_end_time
(Optional
[datetime.datetime
]) - If set, Tecton will schedule backfill materialization jobs for this feature view up to this time. Materialization jobs after this point must be triggered manually. (This param is only valid to set if BatchTriggerType is MANUAL.) Default:None
batch_trigger
(BatchTriggerType
) -BatchTriggerType.SCHEDULED
(default) orBatchTriggerType.MANUAL
Default:BatchTriggerType.SCHEDULED
batch_schedule
(Optional
[datetime.timedelta
]) - The interval at which batch materialization should be scheduled. Default:None
online_serving_index
(Optional
[Sequence
[str
]]) - (Advanced) Defines the set of join keys that will be indexed and queryable during online serving. Default:None
batch_compute
(Optional
[configs.ComputeConfigTypes
]) - Configuration for the batch materialization cluster. Default:None
offline_store
(Optional
[Union
[configs.OfflineStoreConfig
,configs.ParquetConfig
,configs.DeltaConfig
]]) - Configuration for how data is written to the offline feature store. Default:None
online_store
(Optional
[configs.OnlineStoreTypes
]) - Configuration for how data is written to the online feature store. Default:None
monitor_freshness
(bool
) - If true, enables monitoring when feature data is materialized to the online feature store. Default:false
data_quality_enabled
(Optional
[bool
]) - If false, disables data quality metric computation and data quality dashboard. Default:None
skip_default_expectations
(Optional
[bool
]) - If true, skips validating default expectations on the feature data. Default:None
expected_feature_freshness
(Optional
[datetime.timedelta
]) - Threshold used to determine if recently materialized feature data is stale. Data is stale ifnow - most_recent_feature_value_timestamp > expected_feature_freshness
. For feature views using Tecton aggregations, data is stale ifnow - round_up_to_aggregation_interval(most_recent_feature_value_timestamp) > expected_feature_freshness
. Whereround_up_to_aggregation_interval()
rounds up the feature timestamp to the end of theaggregation_interval
. Value must be at least 2 timesaggregation_interval
. If not specified, a value determined by the Tecton backend is used. Default:None
alert_email
(Optional
[str
]) - Email that alerts for this FeatureView will be sent to. Default:None
max_backfill_interval
(Optional
[datetime.timedelta
]) - (Advanced) The time interval for which each backfill job will run to materialize feature data. This affects the number of backfill jobs that will run, which is (<feature registration time>
-feature_start_time
) /max_backfill_interval
. Configuring themax_backfill_interval
parameter appropriately will help to optimize large backfill jobs. If this parameter is not specified, then 10 backfill jobs will run (the default). Default:None
max_batch_aggregation_interval
(Optional
[datetime.timedelta
]) - Deprecated. Use max_backfill_interval instead, which has the exact same usage. Default:None
incremental_backfills
(bool
) - If set toTrue
, the feature view will be backfilled one interval at a time as if it had been updated "incrementally" since its feature_start_time. For example, ifbatch_schedule
is 1 day andfeature_start_time
is 1 year prior to the current time, then the backfill will run 365 separate backfill queries to fill the historical feature data. Default:false
run_transformation_validation
(Optional
[bool
]) - IfTrue
, Tecton will execute the Feature View transformations during tecton plan/apply validation. IfFalse
, then Tecton will not execute the transformations during validation andschema
must be set. Skipping query validation can be useful to speed up tecton plan/apply or for Feature Views that have issues with Tecton's validation (e.g. some pip dependencies). Default is True for Spark and Snowflake Feature Views and False for Python and Pandas Feature Views. Default:None
options
(Optional
[Dict
[str
,str
]]) - Additional options to configure the Feature View. Used for advanced use cases and beta features. Default:None
tecton_materialization_runtime
(Optional
[str
]) - Version oftecton
package used by your job cluster. Default:None
cache_config
(Optional
[configs.CacheConfig
]) - Cache config for the Feature View. Including this option enables the feature server to use the cache when retrieving features for this feature view. Will only be respected if the feature service containing this feature view hasenable_online_caching
set toTrue
. Default:None
batch_compaction_enabled
(Optional
[bool
]) - Deprecated: Please usecompaction_enabled
instead which has the exact same usage. Default:None
compaction_enabled
(Optional
[bool
]) - (Private preview) IfTrue
, Tecton will run a compaction job after each batch materialization job to write to the online store. This requires the use of Dynamo and uses the ImportTable API. Because each batch job overwrites the online store, a larger compute cluster may be required. Default:None
environment
(Optional
[str
]) - The custom environment in which materialization jobs will be run. Defaults toNone
, which means jobs will execute in the default Tecton environment. Default:None
context_parameter_name
(Optional
[str
]) - Name of the function parameter that Tecton injects MaterializationContext object to. Default:None
secrets
(Optional
[Dict
[str
,Union
[Secret
,str
]]]) - A dictionary of Secret references that will be resolved and provided to the transformation function at runtime. During local development and testing, strings may be used instead Secret references. Default:None
resource_providers
(Optional
[Dict
[str
,resource_provider.ResourceProvider
]]) - A dictionary of Resource providers that will be evaluated and resources will be provided to transformation function at runtime. Default:None
Returns
An object of typeBatchFeatureView
BatchFeatureView (Class)​
Attributes​
Name | Data Type | Description |
---|---|---|
aggregation_interval | Optional[datetime.timedelta] | How frequently the feature values are updated |
aggregation_secondary_key | Optional[str] | Configures secondary key aggregates using the set column. |
aggregations | List[configs.Aggregation] | List of Aggregation configs used by this Feature View. |
alert_email | Optional[str] | Email that alerts for this FeatureView will be sent to. |
batch_schedule | Optional[datetime.timedelta] | The batch schedule of this Feature View. |
batch_trigger | BatchTriggerType | The BatchTriggerType for this FeatureView. |
cache_config | Optional[configs.CacheConfig] | Uses cache for Feature View if online caching is enabled. |
compaction_enabled | bool | (Private preview) Runs compaction job post-materialization; requires Dynamo and ImportTable API. |
context_parameter_name | Optional[str] | Name of the function parameter that Tecton injects MaterializationContext object to. |
created_at | Optional[datetime.datetime] | Returns the time that this Tecton object was created or last updated. None for locally defined objects. |
defined_in | Optional[str] | The repo filename where this object was declared. None for locally defined objects. |
description | Optional[str] | Returns the description of the Tecton object. |
entities | ||
environment | Optional[str] | The custom environment in which materialization jobs will be run. Defaults to None , which means jobs will execute in the default Tecton environment. |
expected_feature_freshness | Optional[pendulum.Duration] | Threshold used to determine if recently materialized feature data is stale. |
feature_metadata | List[FeatureMetadata] | |
feature_start_time | Optional[datetime.datetime] | |
id | str | Returns the unique id of the Tecton object. |
incremental_backfills | bool | Backfills incrementally from feature_start_time to current time, one interval at a time |
info | ||
is_batch_trigger_manual | bool | Whether this Feature View's batch trigger is BatchTriggerType.Manual . |
join_keys | List[str] | The join key column names. |
manual_trigger_backfill_end_time | Optional[pendulum.datetime] | If set, Tecton will schedule backfill materialization jobs for this Feature View up to this time. |
max_backfill_interval | Optional[pendulum.Duration] | (Advanced) The time interval for which each backfill job will run to materialize feature data. This affects the number of backfill jobs that will run,
which is (<feature registration time> - feature_start_time ) / max_backfill_interval . Configuring the max_backfill_interval parameter appropriately
will help to optimize large backfill jobs. If this parameter is not specified, then 10 backfill jobs will run (the default). |
max_source_data_delay | datetime.timedelta | Returns the maximum data delay of input sources for this feature view. |
monitor_freshness | bool | If true, enables monitoring when feature data is materialized to the online feature store. |
name | str | Returns the name of the Tecton object. |
offline | bool | Whether the Feature View is materialized to the offline feature store. |
offline_store | Optional[Union[configs.DeltaConfig, configs.ParquetConfig]] | Configuration for the Offline Store of this Feature View. |
online | bool | Whether the Feature View is materialized to the online feature store. |
online_serving_index | List[str] | The set of join keys that will be indexed and queryable during online serving. Â defaults to the complete set of join keys. |
owner | Optional[str] | Returns the owner of the Tecton object. |
prevent_destroy | bool | If set to True, Tecton will block destructive actions taken on this Feature View or Feature Table. |
published_features_path | Optional[str] | The location of published features in the offline store. |
resource_providers | ||
sources | ||
tags | Dict[str, str] | Returns the tags of the Tecton object. |
tecton_materialization_runtime | Optional[str] | Version of tecton package used by your job cluster. |
timestamp_field | Optional[str] | timestamp column name for records from the feature view. |
transformations | ||
ttl | Duration | The TTL (or look back window) for features defined by this Feature View. This parameter determines how long features will live in the online store and how far to look back relative to a training example's timestamp when generating offline training sets. |
url | str | Returns a link to the Tecton Web UI. |
wildcard_join_key | Optional[set] | Returns a wildcard join key column name if it exists; Otherwise returns None. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. None for locally defined objects. |
Methods​
Name | Description |
---|---|
cancel_materialization_job(...) | Cancels the scheduled or running job by the job identifier. |
delete_keys(...) | Deletes any materialized data that matches the specified join keys from the FeatureView. |
get_feature_columns() | Retrieves the list of feature columns produced by this FeatureView. |
get_features_for_events(...) | Returns a TectonDataFrame of historical values for this feature view. |
get_features_in_range(...) | Returns a TectonDataFrame of historical values for this feature view which were valid within the input time range. |
get_historical_features(...) | Returns a TectonDataFrame of historical values for this feature view. |
get_job(...) | Retrieves data about the specified job (materialization or dataset generation). |
get_materialization_job(...) | Retrieves data about the specified materialization job. |
get_online_features(...) | Returns a single Tecton tecton.FeatureVector from the Online Store. |
get_partial_aggregates(...) | Returns the partially aggregated tiles in between start_time and end_time for a Feature View that uses the Tecton Aggregation Engine. |
get_timestamp_field() | Returns the name of the timestamp field for this Feature View. |
list_jobs() | Retrieves the list of all jobs (materialization and dataset generation) for this Feature View or Feature Table. |
list_materialization_jobs() | Retrieves the list of all materialization jobs for this feature view. |
materialization_status(...) | Displays materialization information for the FeatureView, which may include past jobs, scheduled jobs, and job failures. |
print_transformation_schema() | Prints the schema of the output of the transformation. |
run(...) | Run the FeatureView. Supports transforming data directly from raw data sources or using mock data. |
run_transformation(...) | Run the FeatureView Transformation as is without any aggregations or joins. Supports transforming data directly from raw data sources or using mock data. |
summary() | Displays a human-readable summary. |
test_run(...) | Run the FeatureView using mock data sources. This requires a local spark session. |
transformation_schema() | Returns the schema of the output of the transformation. |
trigger_materialization_job(...) | Starts a batch materialization job for this Feature View. |
validate() | Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary. |
wait_for_materialization_job(...) | Blocks until the specified job has been completed. |
with_join_key_map(...) | Rebind join keys for a Feature View or Feature Table used in a Feature Service. |
with_name(...) | Rename a Feature View or Feature Table used in a Feature Service. |
cancel_materialization_job(...)​
Cancels the scheduled or running job by the job identifier. Once cancelled, a job will not be retried further.Â
Job run state will be set to
MANUAL_CANCELLATION_REQUESTED
.
Note that cancellation is asynchronous, so it may take some time for the cancellation to complete.
If job run is already in MANUAL_CANCELLATION_REQUESTED
or in a terminal state then it'll return the job.Parameters
job_id
(str
) - ID string of the materialization job.
Returns
MaterializationJob
: JobData
object for the cancelled job.
delete_keys(...)​
Deletes any materialized data that matches the specified join keys from the FeatureView.Â
This method kicks off a job to delete the data in the offline and online stores. If a FeatureView has multiple entities, the full set of join keys must be specified. Only supports Delta as the offline store (
offline_store=DeltaConfig()
).
Maximum 500,000 keys can be deleted per request.Parameters
keys
(Union
[pyspark_dataframe.DataFrame
,pandas.DataFrame
]) - The Dataframe to be deleted. Must conform to the FeatureView join keys.online
(bool
) - Whether or not to delete from the online store. Default:true
offline
(bool
) - Whether or not to delete from the offline store. Default:true
Returns
List
[str
]: List of job ids for jobs created for entity deletion.
get_feature_columns(...)​
Retrieves the list of feature columns produced by this FeatureView.Returns
List
[str
]: The features produced by this FeatureView.