tecton.batch_feature_view

tecton.batch_feature_view(mode, inputs, entities, online=False, offline=False, ttl=None, feature_start_time=None, batch_schedule=None, max_batch_aggregation_interval=None, online_serving_index=None, batch_cluster_config=None, offline_config=ParquetConfig(), online_config=None, monitoring=None, backfill_config=None, description='', owner='', family='', tags=None, timestamp_key=None, name_override=None)

Declare a batch feature view

Parameters
  • mode (str) – Whether the annotated function is a pipeline function (PIPELINE_MODE) or a transformation function (SPARK_SQL_MODE, PYSPARK_MODE or PANDAS_MODE). If it’s a transformation mode, we infer the pipeline function.

  • inputs (Dict[str, Input]) – The inputs passed into the pipeline.

  • entities (List[Union[Entity, OverriddenEntity]]) – The entities this feature view is associated with.

  • online (Optional[bool]) – Whether the feature view should be materialized to the online feature store.

  • offline (Optional[bool]) – Whether the feature view should be materialized to the offline feature store.

  • ttl (Optional[str]) – The TTL for features defined by this feature view.

  • feature_start_time (Union[DateTime, datetime, None]) – When materialization for this feature view should start from.

  • batch_schedule (Optional[str]) – The interval at which batch materialization should be scheduled.

  • max_batch_aggregation_interval (Optional[str]) – (Advanced) makes batch job scheduler group jobs together for efficiency.

  • online_serving_index (Optional[List[str]]) – (Optional, advanced) Defines the set of join keys that will be indexed and queryable during online serving.

  • batch_cluster_config (Union[ExistingClusterConfig, DatabricksClusterConfig, EMRClusterConfig, None]) – Batch materialization cluster configuration. Should be one of: [EMRClusterConfig, DatabricksClusterConfig, ExistingClusterConfig]

  • offline_config (Union[ParquetConfig, DeltaConfig, None]) – Configuration for how data is written to the offline feature store.

  • online_config (Union[DynamoConfig, RedisConfig, None]) – Configuration for how data is written to the online feature store.

  • monitoring (Optional[MonitoringConfig]) – Monitoring configuration for the feature view.

  • backfill_config (Optional[BackfillConfig]) – Backfill configuration for the feature view.

  • description (str) – (Optional) description.

  • owner (str) – Owner name (typically the email of the primary maintainer).

  • family (str) – (Optional) Family of this Feature View, used to group Tecton Objects.

  • tags (Optional[Dict[str, str]]) – (Optional) Tags associated with this Tecton Object (key-value pairs of arbitrary metadata).

  • timestamp_key (Optional[str]) – The column name that refers to the the timestamp for records that are produced by the feature view.

  • name_override (Optional[str]) – Unique, human friendly name override that identifies the FeatureView.

Returns

A Batch Feature View

Example BatchFeatureView declaration:

from tecton import batch_feature_view, BatchDataSource, HiveDSConfig,
from tecton import Input
from tecton import WINDOW_UNBOUNDED_PRECEDING

# Declare your Entity instance here or import it if defined elsewhere in
# your Tecton repo.
user_credit_entity = ...

# Declare a BatchDataSource that is an input parameter to the Input class instance. The
# BatchDataSource is wrapped inside an Input class instance
batch_bs = BatchDataSource(name='credit_scores_batch',
                           batch_ds_config=HiveDSConfig(database='demo_fraud',
                                                        table='credit_scores',
                                                        timestamp_column_name='timestamp'),
                           family='fraud_detection')

# Wrap the batch_ds as an input to the batch feature view. This is a common
# way to wrap data sources as Input data to feature views.
@batch_feature_view(inputs={"data": Input(source=batch_ds,
                                          window=WINDOW_UNBOUNDED_PRECEDING,
                                          schedule_offset='1hr')
                            },
                    entities=[user_credit_entity],
                    ttl='1d',
                    batch_schedule='1d',
                    online=True,
                    offline=True,
                    feature_start_time=datetime(2020, 5, 1),
                    family='fraud',
                    owner='derek@tecton.ai',
                    tags={'release': 'staging'}
)