tecton.batch_feature_view

tecton.batch_feature_view(*, mode, inputs, entities, ttl, batch_schedule, backfill_config, online=False, offline=False, feature_start_time=None, description=None, owner=None, family=None, tags=None, timestamp_key=None, offline_config=ParquetConfig(), online_config=None, monitoring=None, name_override=None, batch_cluster_config=None, max_batch_aggregation_interval=None, online_serving_index=None)

Declare a batch feature view

Parameters
  • mode (str) – Whether the annotated function is a pipeline function (“pipeline” mode) or a transformation function (“spark_sql” or “pyspark” mode). For the non-pipeline mode, an inferred transformation will also be registered.

  • inputs (Dict[str, Input]) – The inputs passed into the pipeline.

  • entities (List[Union[Entity, OverriddenEntity]]) – The entities this feature view is associated with.

  • ttl (str) – The TTL (or “look back window”) for features defined by this feature view. This parameter determines how long features will live in the online store and how far to “look back” relative to a training example’s timestamp when generating offline training sets. Shorter TTLs improve performance and reduce costs.

  • batch_schedule (str) – The interval at which batch materialization should be scheduled.

  • backfill_config (BackfillConfig) – Backfill configuration for the feature view.

  • online (Optional[bool]) – Whether the feature view should be materialized to the online feature store. (Default: False)

  • offline (Optional[bool]) – Whether the feature view should be materialized to the offline feature store. (Default: False)

  • feature_start_time (Union[DateTime, datetime, None]) – When materialization for this feature view should start from. (Required if offline=true)

  • description (Optional[str]) – Human readable description.

  • owner (Optional[str]) – Owner name (typically the email of the primary maintainer).

  • family (Optional[str]) – Family of this Feature View, used to group Tecton Objects.

  • tags (Optional[Dict[str, str]]) – Tags associated with this Tecton Object (key-value pairs of arbitrary metadata).

  • timestamp_key (Optional[str]) – The column name that refers to the the timestamp for records that are produced by the feature view. (Default: will infer if one column is a Timestamp type.)

  • offline_config (Union[ParquetConfig, DeltaConfig, None]) – Configuration for how data is written to the offline feature store.

  • online_config (Union[DynamoConfig, RedisConfig, None]) – Configuration for how data is written to the online feature store.

  • monitoring (Optional[MonitoringConfig]) – Monitoring configuration for the feature view.

  • name_override (Optional[str]) – Unique, human friendly name override that identifies the FeatureView.

  • batch_cluster_config (Union[ExistingClusterConfig, DatabricksClusterConfig, EMRClusterConfig, None]) – Batch materialization cluster configuration.

  • max_batch_aggregation_interval (Optional[str]) – (Advanced) makes batch job scheduler group jobs together for efficiency.

  • online_serving_index (Optional[List[str]]) – (Advanced) Defines the set of join keys that will be indexed and queryable during online serving.

Returns

An object of type tecton.feature_views.MaterializedFeatureView.

Example BatchFeatureView declaration:

from tecton import batch_feature_view, BatchDataSource, HiveDSConfig,
from tecton import Input
from tecton import WINDOW_UNBOUNDED_PRECEDING

# Declare your Entity instance here or import it if defined elsewhere in
# your Tecton repo.
user_credit_entity = ...

# Declare a BatchDataSource that is an input parameter to the Input class instance. The
# BatchDataSource is wrapped inside an Input class instance
batch_bs = BatchDataSource(name='credit_scores_batch',
                           batch_ds_config=HiveDSConfig(database='demo_fraud',
                                                        table='credit_scores',
                                                        timestamp_column_name='timestamp'),
                           family='fraud_detection')

# Wrap the batch_ds as an input to the batch feature view. This is a common
# way to wrap data sources as Input data to feature views.
@batch_feature_view(inputs={"data": Input(source=batch_ds,
                                          window=WINDOW_UNBOUNDED_PRECEDING,
                                          schedule_offset='1hr')
                            },
                    entities=[user_credit_entity],
                    ttl='1d',
                    batch_schedule='1d',
                    online=True,
                    offline=True,
                    feature_start_time=datetime(2020, 5, 1),
                    family='fraud',
                    owner='derek@tecton.ai',
                    tags={'release': 'staging'}
)