batch_feature_view
Summary​
Declare a Batch Feature View.Parameters
name
(Optional
[str
]) - Unique, human friendly name that identifies the FeatureView. Defaults to the function name. Default:None
description
(Optional
[str
]) - A human readable description. Default:None
owner
(Optional
[str
]) - Typically the name or email of the Feature View's primary maintainer. Default:None
tags
(Optional
[Dict
[str
,str
]]) - Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). Default:None
prevent_destroy
(bool
) - If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object,prevent_destroy
must be set to False via the same tecton apply or a separate tecton apply.prevent_destroy
can be used to prevent accidental changes such as inadvertantly deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs.prevent_destroy
also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. ifprevent_destroy
is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service.prevent_destroy
is only enforced in live (i.e. non-dev) workspaces. Default:false
mode
(str
) - Either the compute mode for the Transformation function or elsepipeline
modesources
(Sequence
[Union
[framework_data_source.BatchSource
,filtered_source.FilteredSource
]]) - The Data Source inputs to the Feature View.entities
(Sequence
[framework_entity.Entity
]) - The entities this Feature View is associated with.aggregation_interval
(Optional
[datetime.timedelta
]) - How frequently the feature values are updated (for example,"1h"
or"6h"
). Only valid when using aggregations. Default:None
aggregations
(Optional
[Sequence
[configs.Aggregation
]]) - A list ofAggregate
Feature objects. Default:None
aggregation_secondary_key
(Optional
[str
]) - Configures secondary key aggregates using the set column. Only valid when using aggregations. Default:None
online
(bool
) - Whether the feature view should be materialized to the online feature store. (Default: False) Default:false
offline
(bool
) - Whether the feature view should be materialized to the offline feature store. (Default: False) Default:false
ttl
(Optional
[datetime.timedelta
]) - The TTL (or "look back window") for features defined by this feature view. This parameter determines how long features will live in the online store and how far to "look back" relative to a training example's timestamp when generating offline training sets. Shorter TTLs improve performance and reduce costs. Default:None
feature_start_time
(Optional
[datetime.datetime
]) - When materialization for this feature view should start from. (Required if offline=true) Default:None
lifetime_start_time
(Optional
[datetime.datetime
]) - The start time for what data should be included in a lifetime aggregate. (Required if using lifetime windows) Default:None
manual_trigger_backfill_end_time
(Optional
[datetime.datetime
]) - If set, Tecton will schedule backfill materialization jobs for this feature view up to this time. Materialization jobs after this point must be triggered manually. (This param is only valid to set if BatchTriggerType is MANUAL.) Default:None
batch_trigger
(BatchTriggerType
) -BatchTriggerType.SCHEDULED
(default) orBatchTriggerType.MANUAL
Default:BatchTriggerType.SCHEDULED
batch_schedule
(Optional
[datetime.timedelta
]) - The interval at which batch materialization should be scheduled. Default:None
online_serving_index
(Optional
[Sequence
[str
]]) - (Advanced) Defines the set of join keys that will be indexed and queryable during online serving. Default:None
batch_compute
(Optional
[configs.ComputeConfigTypes
]) - Configuration for the batch materialization cluster. Default:None
offline_store
(Optional
[Union
[configs.OfflineStoreConfig
,configs.ParquetConfig
,configs.DeltaConfig
]]) - Configuration for how data is written to the offline feature store. Default:None
online_store
(Optional
[configs.OnlineStoreTypes
]) - Configuration for how data is written to the online feature store. Default:None
monitor_freshness
(bool
) - If true, enables monitoring when feature data is materialized to the online feature store. Default:false
data_quality_enabled
(Optional
[bool
]) - If false, disables data quality metric computation and data quality dashboard. Default:None
skip_default_expectations
(Optional
[bool
]) - If true, skips validating default expectations on the feature data. Default:None
expected_feature_freshness
(Optional
[datetime.timedelta
]) - Threshold used to determine if recently materialized feature data is stale. Data is stale ifnow - most_recent_feature_value_timestamp > expected_feature_freshness
. For feature views using Tecton aggregations, data is stale ifnow - round_up_to_aggregation_interval(most_recent_feature_value_timestamp) > expected_feature_freshness
. Whereround_up_to_aggregation_interval()
rounds up the feature timestamp to the end of theaggregation_interval
. Value must be at least 2 timesaggregation_interval
. If not specified, a value determined by the Tecton backend is used. Default:None
alert_email
(Optional
[str
]) - Email that alerts for this FeatureView will be sent to. Default:None
timestamp_field
(Optional
[str
]) - The column name that refers to the timestamp for records that are produced by the feature view. This parameter is optional if exactly one column is a Timestamp type. This parameter is required if using Tecton on Snowflake without Snowpark. Default:None
max_backfill_interval
(Optional
[datetime.timedelta
]) - (Advanced) The time interval for which each backfill job will run to materialize feature data. This affects the number of backfill jobs that will run, which is (<feature registration time>
-feature_start_time
) /max_backfill_interval
. Configuring themax_backfill_interval
parameter appropriately will help to optimize large backfill jobs. If this parameter is not specified, then 10 backfill jobs will run (the default). Default:None
max_batch_aggregation_interval
(Optional
[datetime.timedelta
]) - Deprecated. Use max_backfill_interval instead, which has the exact same usage. Default:None
incremental_backfills
(bool
) - If set toTrue
, the feature view will be backfilled one interval at a time as if it had been updated "incrementally" since its feature_start_time. For example, ifbatch_schedule
is 1 day andfeature_start_time
is 1 year prior to the current time, then the backfill will run 365 separate backfill queries to fill the historical feature data. Default:false
schema
(Optional
[List
[types.Field
]]) - [Deprecated] The output schema of the Feature View transformation. If provided andrun_transformation_validations=True
, then Tecton will validate that the Feature View matches the expected schema. Default:None
run_transformation_validation
(Optional
[bool
]) - IfTrue
, Tecton will execute the Feature View transformations during tecton plan/apply validation. IfFalse
, then Tecton will not execute the transformations during validation andschema
must be set. Skipping query validation can be useful to speed up tecton plan/apply or for Feature Views that have issues with Tecton's validation (e.g. some pip dependencies). Default is True for Spark and Snowflake Feature Views and False for Python and Pandas Feature Views. Default:None
options
(Optional
[Dict
[str
,str
]]) - Additional options to configure the Feature View. Used for advanced use cases and beta features. Default:None
tecton_materialization_runtime
(Optional
[str
]) - Version oftecton
package used by your job cluster. Default:None
cache_config
(Optional
[configs.CacheConfig
]) - Cache config for the Feature View. Including this option enables the feature server to use the cache when retrieving features for this feature view. Will only be respected if the feature service containing this feature view hasenable_online_caching
set toTrue
. Default:None
batch_compaction_enabled
(Optional
[bool
]) - Deprecated: Please usecompaction_enabled
instead which has the exact same usage. Default:None
compaction_enabled
(Optional
[bool
]) - (Private preview) IfTrue
, Tecton will run a compaction job after each batch materialization job to write to the online store. This requires the use of Dynamo and uses the ImportTable API. Because each batch job overwrites the online store, a larger compute cluster may be required. Default:None
environment
(Optional
[str
]) - The custom environment in which materialization jobs will be run. Defaults toNone
, which means jobs will execute in the default Tecton environment. Default:None
features
(Optional
[Union
[Sequence
[feature.Aggregate
],Sequence
[Union
[feature.Attribute
,feature.Embedding
,feature.Inference
]]]]) - A list of Attribute, Aggregate, and Embedding feature values managed by this Feature View. Default:None
context_parameter_name
(Optional
[str
]) - Name of the function parameter that Tecton injects MaterializationContext object to. Default:None
Returns
An object of typeBatchFeatureView
Examples​
Example 1​
from datetime import datetime
from datetime import timedelta
from fraud.entities import user
from fraud.data_sources.credit_scores_batch import credit_scores_batch
from tecton import batch_feature_view, Aggregation, FilteredSource
from tecton.types import Int64, String, Timestamp, Field
@batch_feature_view(
sources=[FilteredSource(credit_scores_batch)],
entities=[user],
mode="spark_sql",
online=True,
offline=True,
feature_start_time=datetime(2020, 10, 10),
batch_schedule=timedelta(days=1),
ttl=timedelta(days=60),
schema=[
Field("USER_ID", String),
Field("TIMESTAMP", Timestamp),
Field("LAST_TRANSACTION_AMOUNT", Int64),
Field("LAST_TRANSACTION_CATEGORY", String),
],
description="Features about the users most recent transaction in the past 60 days. Updated daily.",
tecton_materialization_runtime="0.8.0",
)
def user_last_transaction_features(credit_scores_batch):
return f"""
SELECT
USER_ID,
TIMESTAMP,
AMOUNT as LAST_TRANSACTION_AMOUNT,
CATEGORY as LAST_TRANSACTION_CATEGORY
FROM
{credit_scores_batch}
"""
Example 2​
Example BatchFeatureView
declaration using aggregates:
from datetime import datetime
from datetime import timedelta
from fraud.entities import user
from fraud.data_sources.credit_scores_batch import credit_scores_batch
from tecton import batch_feature_view, Aggregation, FilteredSource, TimeWindow
from tecton.types import Int64, String, Timestamp, Field
@batch_feature_view(
sources=[FilteredSource(credit_scores_batch)],
entities=[user],
mode="spark_sql",
online=True,
offline=True,
feature_start_time=datetime(2020, 10, 10),
aggregations=[
Aggregation(column="amount", function="mean", time_window=TimeWindow(window_size=timedelta(days=1))),
Aggregation(column="amount", function="mean", time_window=TimeWindow(window_size=timedelta(days=30))),
],
aggregation_interval=timedelta(days=1),
schema=[
Field("USER_ID", String),
Field("AMOUNT", Int64),
Field("TIMESTAMP", Timestamp),
],
description="Transaction amount statistics and total over a series of time windows, updated daily.",
tecton_materialization_runtime="0.8.0",
)
def user_recent_transaction_aggregate_features(credit_scores_batch):
return f"""
SELECT
USER_ID,
AMOUNT,
TIMESTAMP
FROM
{credit_scores_batch}
"""