tecton.batch_window_aggregate_feature_view

tecton.batch_window_aggregate_feature_view(mode, inputs, entities, aggregation_slide_period, aggregations, online=False, offline=False, feature_start_time=None, batch_schedule=None, max_batch_aggregation_interval=None, online_serving_index=None, batch_cluster_config=None, offline_config=ParquetConfig(), monitoring=None, description='', owner='', family='', tags=None, timestamp_key=None, name_override=None)

Declare a batch window aggregate feature view

Parameters
  • mode (str) – Whether the annotated function is a pipeline function (“pipeline”) or a transformation function (“spark_aql”, “pyspark” or “pandas”). If it’s a transformation mode, we infer the pipeline function.

  • inputs (Dict[str, Input]) – The inputs passed into the pipeline.

  • entities (List[Union[Entity, OverriddenEntity]]) – The entities this feature view is associated with.

  • aggregation_slide_period (str) – How frequently the feature value is updated (for example, “1h” or “6h”)

  • aggregations (List[FeatureAggregation]) – A list of FeatureAggregation structs.

  • online (Optional[bool]) – Whether the feature view should be materialized to the online feature store.

  • offline (Optional[bool]) – Whether the feature view should be materialized to the offline feature store.

  • feature_start_time (Union[DateTime, datetime, None]) – When materialization for this feature view should start from.

  • batch_schedule (Optional[str]) – The interval at which batch materialization should be scheduled.

  • max_batch_aggregation_interval (Optional[str]) – (Advanced) makes batch job scheduler group jobs together for efficiency.

  • online_serving_index (Optional[List[str]]) – (Optional, advanced) Defines the set of join keys that will be indexed and queryable during online serving.

  • batch_cluster_config (Union[ExistingClusterConfig, DatabricksClusterConfig, EMRClusterConfig, None]) – Batch materialization cluster configuration. Should be one of: [EMRClusterConfig, DatabricksClusterConfig, ExistingClusterConfig]

  • offline_config (Union[ParquetConfig, DeltaConfig, None]) – Configuration for how data is written to the offline feature store.

  • monitoring (Optional[MonitoringConfig]) – Monitoring configuration for the feature view.

  • description (str) – (Optional) description.

  • owner (str) – Owner name (typically the email of the primary maintainer).

  • family (str) – (Optional) Family of this Feature View, used to group Tecton Objects.

  • tags (Optional[Dict[str, str]]) – (Optional) Tags associated with this Tecton Object (key-value pairs of arbitrary metadata).

  • timestamp_key (Optional[str]) – The column name that refers to the the timestamp for records that are produced by the feature view.

  • name_override (Optional[str]) – Unique, human friendly name override that identifies the FeatureView.

Returns

A Batch Window Aggregate Feature View

An example declaration of batch window aggregate feature view

from tecton.feature_views import batch_window_aggregate_feature_view
from tecton.feature_views.feature_view import Input
from tecton import FeatureAggregation
from datetime import datetime

# Declare your Entity and BatchDataSource instances here or import them if defined elsewhere in
# your Tecton repo. Check the API reference documentation on how to declare Entity and BatchDataSource
# instances

transactions_batch = ...
user = ...
@batch_window_aggregate_feature_view(
    inputs={'transactions': Input(transactions_batch)},
    entities=[user],
    mode='spark_sql',
    aggregation_slide_period='1d',
    aggregations=[FeatureAggregation(column='transaction', function='count',
                                     time_windows=['24h','72h','168h', '960h'])],
    online=True,
    offline=True,
    feature_start_time=datetime(2020, 10, 10),
    family='fraud',
    tags={'release': 'production'},
    owner='matt@tecton.ai',
    description='User transaction totals over a series of time windows, updated daily.'
)
def user_transaction_counts(transactions):
    return f'''
        SELECT
            nameorig as user_id,
            1 as transaction,
            timestamp
        FROM
            {transactions}
        '''