Skip to content

Feature Views

In Tecton, features are defined as a view on registered Data Sources or other Feature Views. Feature Views are the core abstraction that enables:

  • Using one feature definition for both training and serving.
  • Reusing features across models.
  • Managing feature lineage and versioning.
  • Orchestrating compute and storage of features.

A Feature View contains all information required to manage one or more related features, including:

  • Pipeline: A transformation pipeline that takes in one or more sources and runs transformations to compute features. Sources can be Tecton Data Sources or in some cases, other Feature Views.
  • Entities: The common objects that the features are attributes of such as Customer or Product. The entities dictate the join keys for the Feature View.
  • Configuration: Materialization configuration for defining the orchestration and serving of features, as well as monitoring configuration.
  • Metadata: Optional metadata about the features used for organization and discovery. This can include things like descriptions, families, and tags.

The pipeline and entities for a Feature View define the semantics of what the feature values truly represent. Changes to a Feature View's pipelines or entities are therefore considered destructive and will result in the rematerialization of feature values.

Concept: Feature Views in a Feature Store

Framework Concept Diagram

This diagram illustrates how Tecton Objects interact with each other.

Types of Feature Views

There are 3 types of Feature Views:

  1. Batch Feature Views run transformations on one or more Batch Sources and can materialize feature data to the Online and/or Offline Feature Store on a schedule.
  2. Stream Feature Views transform features in near-real-time against a Stream Source and can materialize data to the Online and Offline Feature Store.
  3. On-Demand Feature Views run transformations at request time based on data from a Request Source, Batch Feature View, or Stream Feature View.
Name Source types Transformation types
Batch Feature View Batch Data Source pyspark, snowflake_sql, snowpark, spark_sql
Stream Feature View Stream Data Source pyspark, snowflake_sql, spark_sql
On-Demand Feature View Request Data Source, Batch Feature View, Stream Feature View python, pandas

Defining a Feature View

A Feature View is defined using an decorator over a function that represents a pipeline of Transformations.

Below, we'll describe the high-level components of defining a Feature View. See the individual Feature View type sections for more details and examples.

# Feature View type
@batch_feature_view(
    # Pipeline attributes
    sources=...
    mode=...

    # Entities
    entities=...

    # Materialization and serving configuration
    online=...
    offline=...
    batch_schedule=...
    feature_start_time=...
    ttl=...

    # Metadata
    owner=...
    description=...
    tags=...
)
# Feature View name
def my_feature_view(input_data):
    intermediate_data = my_transformation(input_data)
    output_data = my_transformation_two(intermediate_data)
    return output_data

See the API reference for the specific parameters available for each type of Feature View.

Function Definition

Feature Views are registered by adding a decorator (e.g. @batch_feature_view) to a Python function. The decorator supports several parameters to configure the Feature View.

The default name of the Feature View registered with Tecton will be the name of the function. If needed, the name can be explicitly set using the name decorator parameter.

The function inputs are retrieved from the specified sources in corresponding order. Tecton will use the function pipeline definition to construct, register, and execute the specified graph of transformations.

Pipeline Definition

The body of a Feature View function calls Transformations that define features. The data sources configured by the decorator parameters will be made available as inputs to the function.

The output columns of our Feature View DataFrame must include:

  1. The join keys of all entities included in the entities list
  2. A timestamp column. If there is more than one timestamp column, a timestamp_key parameter must be set to specify which column is the correct timestamp of the feature values.
  3. Feature value columns. All columns other than the join keys and timestamp will be considered features in a Feature View.

Important

There are two Feature View pipeline modes: inline and pipeline. This is configured using the mode parameter.

Mode Supported mode values Description
Inline pandas, pyspark, python, snowflake_sql, snowpark, spark_sql Single transformation pipeline declared inline with a Feature View definition.
Pipeline pipeline 1 or more @transformation functions defined separately from a Feature View.

Inline

Feature Views that only use one Transformation can define the Transformation within the body of the Feature View function. For example, this code snippet is a Feature View with a single Transformation in spark_sql mode that simply renames columns from the data source to feature_one and feature_two.

@batch_feature_view(
    mode="spark_sql",
    ...
)
def my_feature_view(input_data):
    return f"""
        SELECT
            entity_id,
            timestamp,
            column_a AS feature_one,
            column_b AS feature_two
        FROM {input_data}
    """

Pipeline

For example, the code snippet from above can be rewritten using two functions in pipeline mode. Now, the Transformation uses spark_sql mode and the Feature View uses pipeline mode.

@transformation(mode="spark_sql")
def my_transformation(input_data):
    return f"""
        SELECT
            entity_id,
            timestamp,
            column_a AS feature_one,
            column_b AS feature_two
        FROM {input_data}
    """

@batch_feature_view(
    mode="pipeline",
    ...
)
def my_feature_view(input_data):
    return my_transformation(input_data)

Multi-Transformation Pipelines

More complicated feature logic can be factored into multiple transformations for readability and reusability. All data operations must be inside a transformation. The transformations for a Feature View cannot contain arbitrary Python code.

In this example, we implement a generic str_split transformation on a specified column, followed by another transformation to calculate some summary statistics for the feature.

Note that passing constants to a transformations requires using const which can be imported from tecton.

from tecton import transformation, batch_feature_view, const, FilteredSource
from entities import auction
from data_sources.ad_impressions import ad_impressions_batch
from datetime import datetime

@transformation(mode="spark_sql")
def str_split(input_data, column_to_split, new_column_name, delimiter):
    return f"""
        SELECT
            *,
            split({column_to_split}, {delimiter}) AS {new_column_name}
        FROM {input_data}
        """

@transformation(mode="spark_sql")
def keyword_stats(input_data, keyword_column):
    return f"""
        SELECT
            auction_id,
            timestamp,
            {keyword_column} AS keyword_list,
            size({keyword_column}) AS num_keywords,
            array_contains({keyword_column}, "bitcoin") AS keyword_contains_bitcoin
        FROM {input_data}
        """


@batch_feature_view(
    mode='pipeline',
    sources=[FilteredSource(ad_impressions_batch)]
    entities=[auction],
    batch_schedule='1d',
    online=True,
    offline=True,
    feature_start_time=datetime(2020, 5, 1),
    ttl='365d',
)
def auction_keywords(ad_impressions):
    split_keywords = str_split(ad_impressions, const("content_keyword"), const("keywords"), const("\' \'"))
    return keyword_stats(split_keywords, const("keywords"))

Interacting with Feature Views

Once you have applied your Feature View to the Feature Store, the Tecton SDK provides a set of methods that allow you to access a feature in your Notebook. Here are a few examples of common actions.

Retrieving a Feature View Object

First, you'll need to get the feature view with the registered name.

ws = tecton.get_workspace('prod')
feature_view = ws.get_feature_view("user_ad_impression_counts")

Running Feature View Transformation Pipeline

You can dry-run the feature view transformation pipeline from the notebook for all types of feature view.

result_dataframe = feature_view.run()
display(result_dataframe.to_pandas())

See the API reference for the specific parameters available for each type of Feature View.

For a Stream Feature View, you can also run the streaming job. This will write to a temporary table which can be queried

feature_view.run_stream(output_temp_table="temp_table")  # start streaming job
display(spark.sql("SELECT * FROM temp_table LIMIT 5"))`  # Query the output table

Reading Feature View Data

Reading a sample of feature values can help validate that you've implemented it correctly, or understand the data structure when exploring a feature you're unfamiliar with.

For Batch and Stream features you can use the FeatureView.get_historical_features() method to view some output from your new feature. To help your query run faster, you can use the start_time and end_time parameters to select a subset of dates, or pass an entities DataFrame of keys to view results for just those entity keys.

By default, get_historical_features will always retrieve data from the Offline Feature Store, but you can bypass the offline store and run transformations on the fly using the parameter from_source=True.

from datetime import datetime, timedelta
start_time = datetime.today() - timedelta(days=2)
results = feature_view.get_historical_features(start_time=start_time)
display(results)

Because On-Demand Feature Views depend on request data and or other Batch and Stream Feature Views, they cannot simply be looked up from the Feature Store using get_historical_features() without the proper input data. Refer to On-Demand Feature Views for more details on how to test and preview these Feature Views.