tecton.feature_views.MaterializedFeatureView

class tecton.feature_views.MaterializedFeatureView(name, pipeline_function, inputs, entities, online, offline, offline_config, online_config, aggregation_slide_period, aggregations, ttl, feature_start_time, batch_schedule, max_batch_aggregation_interval, online_serving_index, batch_cluster_config, stream_cluster_config, monitoring, backfill_config, description, owner, family, tags, inferred_transform, feature_view_type, timestamp_key, data_source_type, user_function, framework_version, is_custom=False, output_stream=None)

Materialized FeatureView internal declaration and testing class.

Do not instantiate this class directly. Use a decorator-based constructor instead:

Methods

__init__

Do not directly use this constructor. Internal constructor for materialized FeatureViews.

run

Run the FeatureView using mock inputs.

__init__(name, pipeline_function, inputs, entities, online, offline, offline_config, online_config, aggregation_slide_period, aggregations, ttl, feature_start_time, batch_schedule, max_batch_aggregation_interval, online_serving_index, batch_cluster_config, stream_cluster_config, monitoring, backfill_config, description, owner, family, tags, inferred_transform, feature_view_type, timestamp_key, data_source_type, user_function, framework_version, is_custom=False, output_stream=None)

Do not directly use this constructor. Internal constructor for materialized FeatureViews.

run(spark, materialization_context=None, aggregation_level=None, **mock_inputs)

Run the FeatureView using mock inputs. This requires a local spark session.

Parameters
  • spark (SparkSession) – Required. Spark session object.

  • materialization_context (Optional[BoundMaterializationContext]) – Optional. MaterializationContext used to set feature start and end times.

  • aggregation_level (Optional[str]) –

    Only applicable to window aggregate FeatureViews. Select the level of aggregation over the output result dataframe. Allowed values:

    • ”full” - Fully aggregate the features. The output rows for each of the time_windows specified in FeatureAggregation(s) under the FeatureView config will be aggregated.

    • ”partial” - Aggregate output rows under each fixed size sliding aggregate window within the provided data’s time range. Aggregate window size is specified by aggregation_slide_period in the FeatureView config.

    • ”disabled” - No aggregation operation performed.

    If unspecified, the highest level of aggregation for the FeatureView type is used.

  • mock_inputs (DataFrame) – Dictionary with expected same keys as the FeatureView’s inputs parameter. Each input name maps to a Spark DataFrame that should be evaluated for that node in the pipeline.

Example

# Declare a BatchDataSource that is an input parameter to the Input class instance.
# The BatchDataSource is wrapped inside an Input class instance
batch_bs = BatchDataSource(name='credit_scores_batch',
                        batch_ds_config=HiveDSConfig(database='demo_fraud',
                                                     table='credit_scores',
                                                     timestamp_column_name='tstamp'))

# Wrap batch_ds as an input to the batch_feature_view. This is a common
# way to wrap data sources as Input data to feature views.
@batch_feature_view(inputs={"data": Input(source=batch_ds)},
                    entities=[user_credit_entity],
                    ttl='1d',
                    batch_schedule='1d',
                    online=False,
                    offline=False,
                    feature_start_time=datetime(2020, 5, 1))
def credit_feature_view(source):
    ...

# Testing using `run` API
input_df=pd.DataFrame({
    'tstamp': [pd.Timestamp("2021-01-18 12:00:06")]
    'amount': [1234.56],
    'user_account_id': ['abc123']
})
output_df = credit_feature_view.run(source=input_df)
Returns

A tecton.DataFrame object.

Attributes

name

Name of this Tecton Object.