Skip to content

Feature View Overview

In Tecton, we define features as a view on registered Data Sources or other Feature Views. Feature Views are the core abstraction that enables:

  • Using one feature definition for both training and serving.
  • Reusing features across models.
  • Managing feature lineage and versioning.
  • Orchestrating compute and storage of features.

A Feature View class contains all information required to manage one or more related features, including:

  • Pipeline: A transformation pipeline that takes in one or more inputs and runs transformations to compute features. Inputs can be Tecton Data Sources or in some cases, other Feature Views.
  • Entities: The common objects that the features are attributes of such as Customer or Product. The entities dictate the join keys for the Feature View.
  • Configuration: Materialization configuration for defining the orchestration and serving of features, as well as monitoring configuration.
  • Metadata: Optional metadata about the features used for organization and discovery. This can include things like descriptions, families, and tags.

The pipeline and entities for a Feature View define the semantics of what the feature values truly represent. Changes to a Feature View's pipelines or entities are therefore considered destructive and will result in the rematerialization of feature values.

How Feature Views fit into your Feature Store

Framework Concept Diagram

This diagram illustrates how Tecton Objects interact with each other.

There are different types of Feature Views that can be used to connect to different inputs and run different types of feature data pipelines.

  • Batch and Stream Feature Views take Batch and Stream Data Sources as inputs, pass those inputs through a pipeline of transformations, and optionally materialize the data to the Offline and Online Feature Stores for use in training and serving.
  • On-Demand Feature Views can take Batch and Stream Feature Views as inputs as well as Request Data Sources, and run a pipeline of transformations at request time to compute new feature values.
  • Batch Window Aggregate and Stream Window Aggregate Feature Views are specialized pipelines that are optimized across pre-computation and on-demand computation to create highly efficient and fresh sliding window aggregation features.

Feature Views are grouped together in a Feature Service to define the total set of features for training and serving a model.

Typically a single Feature View will contain a set of related individual features that can easily be expressed within a single query.

Types of Feature Views

Name When to use Input types Transformation types
Batch Feature View Defining row-level transformations or custom aggregations for batch data sources. Batch Data Source pyspark, spark_sql
Stream Feature View Defining row-level or custom aggregations for stream data sources. Stream Data Source pyspark, spark_sql
Batch Window Aggregate Feature View Efficiently create multiple time-windowed aggregate features from batch data sources. Batch Data Source pyspark, spark_sql, built in time-windowed aggregations
Stream Window Aggregate Feature View Efficiently create multiple time-windowed aggregate features from stream data sources. Stream Data Source pyspark, spark_sql, built in time-windowed aggregations
On-Demand Feature View Define features based on request time data, or combine multiple existing features. Request Data Source, Batch Feature View, Stream Feature View pandas

Defining a Feature View

A Feature View is defined using an annotation over a function that represents a pipeline of Transformations.

Below, we'll describe the high-level components of defining a Feature View. Please see the different Feature View type sections for more details and examples.

# Feature View type
@batch_feature_view(
    # Pipeline attributes
    inputs=...
    mode=...
    # Entities
    entities=...
    # Materialization and serving configuration
    online=...
    offline=...
    batch_schedule=...
    feature_start_time=...
    ttl=..
    # Metadata
    owner=...
    description=...
    tags=...
)
# Feature View name
def my_feature_view(input_data):
    intermediate_data = my_transformation(input_data)
    output_data = my_transformation_two(intermediate_data)
    return output_data

Function Definition

To register a Python function as a Feature View, you'll add an annotation with the type of feature view you're creating, and parameters for the relevant attributes. For example, to create a Batch Feature View, you'll use an @batch_feature_view annotation.

By default, the name of the function will be the name of the Feature View in Tecton. If needed, you can explicitly set the name with the name_override annotation attribute.

The function inputs will be DataFrames retrieved from the specified inputs. Tecton will use the function pipeline definition to construct, register, and execute the specified graph of transformations.

Ultimately, the transformation pipeline output must produce a DataFrame.

Defining Transformation Pipelines

The body of your Feature View function will call Transformations to define the data for your features. The data sources configured in the annotation parameters will be made available as inputs to the function.

The output columns of our Feature View DataFrame must include: 1. The join keys of all entities included in the entities list 2. A timestamp column. If there is more than one timestamp column, a timestamp_key parameter must be set to specify which column is the correct timestamp of the feature values. 3. Feature value columns. All columns other than the join keys and timestamp will be considered features in a Feature View.

This code snippet illustrates a feature view with a single transformation that simply renames columns from the data source to create feature_one and feature_two.

@transformation(mode="spark_sql")
def my_transformation(input_data):
    return f"""
        SELECT
            entity_id,
            timestamp,
            column_a AS feature_one,
            column_b AS feature_two
        FROM {input_data}
    """

@batch_feature_view(
    mode="pipeline", # Specifies that the function below is a transformation pipeline
    ...
)
def my_feature_view(input_data):
    return my_transformation(input_data)

In-line Transformation Pipelines

For your convenience, Feature Views that only use one Transformation can define that Transformation in the body of the Feature View function, by setting the mode parameter to a transformation type. For example, we could re-write the code snippet from above without having to create two functions.

@batch_feature_view(
    mode="spark_sql", # Specifies that the function below is an inline Spark SQL transformatin
    ...
)
def my_feature_view(input_data):
    # Define Spark Sql in-line
    return f"""
        SELECT
            entity_id,
            timestamp,
            column_a AS feature_one,
            column_b AS feature_two
        FROM {input_data}
    """

Pipelines with Multiple Transformations

You can make complicated feature logic more legible and reusable by passing your data through multiple transformations. Python gives you flexibility to compose the transformations however you'd like. Remember that all data operations, such as PySpark methods, must be inside a transformation. The pipeline function for a Feature View can not contain arbitrary Python code.

In this example, we implement a generic str_split transformation that operates on the specified column, then use another transformation to calculate some summary statistics for the feature.

Note that passing constants to a transformations requires using const which can be imported from tecton.

from tecton import transformation, Input, batch_feature_view, const
from ads.entities import auction
from ads.data_sources.ad_impressions_batch import ad_impressions_batch
from datetime import datetime

# Create new column by splitting the string in an existing column.
@transformation(mode="spark_sql")
def str_split(input_data, column_to_split, new_column_name, delimiter):
    return f"""
    SELECT
        *,
        split({column_to_split}, {delimiter}) AS {new_column_name}
    FROM {input_data}
    """

# Create features based on the keyword array
@transformation(mode="spark_sql")
def keyword_stats(input_data, keyword_column):
    return f"""
    SELECT
        auction_id,
        timestamp,
        size({keyword_column}) AS num_keywords,
        concat_ws(",", {keyword_column}) AS keyword_list,
        array_contains({keyword_column}, "bitcoin") AS keyword_contains_bitcoin
    FROM {input_data}
    """

# This feature view runs in pipeline mode to turn the keyword string into an
# array of words, then create metrics based on that array.
@batch_feature_view(
    mode='pipeline',
    inputs={
        'ad_impressions': Input(ad_impressions_batch)
    },
    entities=[auction],
    ttl='1d',
    batch_schedule='1d',
    online=True,
    offline=True,
    feature_start_time=datetime(2020, 5, 1),
    family='ads',
    owner='derek@tecton.ai',
    tags={'release': 'production'}
    )
def auction_keywords(ad_impressions):
    split_keywords = str_split(ad_impressions, const("content_keyword"), const("keywords"), const("\' \'"))
    return keyword_stats(split_keywords, const("keywords"))

Annotation Parameters

The attributes configured in the annotation will tell Tecton how to:

  • Execute your pipeline
  • Materialize your data
  • Organize your features

See the API reference for the specific parameters available for each type of Feature View.

Interacting with a Feature View

Once you have applied your Feature View to the Feature Store, the Tecton SDK provides a set of methods that allow you to access a feature in your Notebook. Here are a few examples of common actions.

Retrieving a Feature View object

First, you'll need to get the feature view with the registered name.

feature_view = tecton.get_feature_view("user_ad_impression_counts")

Viewing Feature View Data

Viewing a sample of feature values can help validate that you've implemented it correctly, or understand the data structure when exploring a feature you're unfamiliar with.

For Batch and Stream features you can use the FeatureView.get_features() method to view some output from your new feature. To help your query run faster, you can use the start_time and end_time parameters to select a subset of dates, or pass an entities DataFrame of keys to view results for just those entity keys.

from datetime import datetime, timedelta
start_time = datetime.today() - timedelta(days=2)
results = feature_view.get_features(start_time=start_time)
display(results)

Because On-Demand Feature Views depend on request data and or other Batch and Stream Feature Views, they cannot simply be looked up from the Feature Store using get_features() without the proper input data. Refer to On-Demand Feature Views for more details on how to test and preview these Feature Views.