Skip to content

Feature Packages

In Tecton, the abstraction for creating and managing features is the Feature Package, implemented as the FeaturePackage class. A Feature Package contains all information required to manage one or more related features, including:

  • Metadata about the features, which Tecton uses for organization. Examples are name, Entity, feature owner, and tags.
  • References to Transformations, which describe the logic used to generate feature values from raw data
  • Materialization settings, which describe how and when Tecton should compute feature values

Feature Packages provide a number of benefits:

  • Coordinated computation of feature values for both training and serving
  • A standard interface for accessing different Feature Packages β€” your features are treated as code
  • Safe reuse and sharing of your features across projects and models
  • Feature lineage and versioning, which enable safe changes to Feature Packages in production

Defining a Feature Package

Define a Feature Package by declaring its properties within a FeaturePackage class.

Types of Feature Packages

Tecton provides four FeaturePackage classes, depending on the nature of the features:

  • Temporal Feature Packages

    The standard feature in Tecton is implemented using the TemporalFeaturePackage class. Temporal Feature Packages can include either row-level or aggregate features. Temporal Feature Packages can be materialized (precomputed by Tecton and stored for low-latency serving.) See Materialization.

  • Time-Windowed Aggregation Feature Packages

    Features that include time-windowing aggregations (for example min, max, count, or sum over different time periods) are defined using the classΒ TemporalAggregateFeaturePackage. Tecton implements these operations efficiently leading to various performance improvements over TemporalFeaturePackage for these types of transformations. These Feature Packages be materialized as well.

  • Online Feature Packages

    Features that involve transformations of the request payload at the time of the prediction are defined using the class OnlineFeaturePackage.

  • Push Feature Packages

    Features that are calculated outside of Tecton and ingested (pushed) into Tecton by the user are defined using the PushFeaturePackage class. The definition requires an explicit schema and designated join keys. The non-join-key columns are the features β€” these are ingested and served with no additional transformations.

Attributes of a Feature Package

Each FeaturePackage object contains a number of user-defined attributes. Below is the list of mandatory attributes passed to the constructor as arguments.

  • name: A unique identifier for the feature. When the feature is registered with Tecton, this string can be used to access the feature's values and properties
  • entities: The properties around which the feature is generated ("user" and "product," for example). Tecton's Entity concept is embodied in the Entity class, described here
  • transformation: A Tecton Transformation function containing the transformation logic for the feature. The concept of a Tecton Transformation is described here
  • materialization: A MaterializationConfig object. The Materialization Configuration attributes specify parameters around persisting data, such as whether to store data for online serving and the starting point for backfilling feature data. Tecton Materializations are described in Materialization Configuration, below.

In addition to the attributes listed above, a TemporalAggregateFeaturePackage includes two attributes for windowing functions:

  • aggregation_slide_period: How frequently the feature value is updated (for example, "1h" or "6h")
  • aggregations: Contains a FeatureAggregation object that specifies the column to aggregate, the operation to perform, and the time windows over which to perform the operation

Since request-time transformations occur at prediction time and are not pre-computed, the OnlineFeaturePackage does not contain a materialization attribute

The PushFeaturePackage class contains an additional attribute schema that defines the schema of the ingested data. Tecton makes no assumptions about the data source for the PushFeaturePackage, so it derives the PushFeaturePackage's names and types from the schema. See the PushFeaturePackage reference.

Materialization Configuration

The MaterializationConfig class defines how Tecton pre-computes and stores feature values. Materialization makes features available for both online prediction and quick offline retrieval. Each MaterializationConfig object contains a number of attributes:

  • offline_enabled: Defines whether to store historical data for training
  • online_enabled: Defines whether to store the latest data for serving
  • feature_start_time: Specifies the date to which historical data is backfilled

The following attributes are only needed in a TemporalFeaturePackage:

  • serving_ttl: Defines a time to live beyond which feature values become ineligible for serving in production. Also specifies the maximum look-back window when fetching historical feature values
  • schedule_interval: For batch features, defines how frequently feature values are recomputed. For streaming features, defines how frequency offline materialization is performed (online materialization is performed continuously using streams.)
  • data_lookback_period: Defines the range of data that is passed from the Virtual Data Source to the Feature Package's transformation. Defaults to the schedule_interval

Example: Ad Serving and Partner Websites Feature

The feature below is used in an ad serving model. The feature represents "The number of ads a user has been shown on a given partner site over the past 7 days." It is defined as follows:

  • The Transformation is defined in user_partner_impression_count_7_days_transformer
  • The feature is built around the user and partner Entities (user_entity and partner_entity)
  • Feature values are stored for training and serving (offline_enabled and online_enabled both true)
  • The stored training data begins on June 20th, 2020 (feature_start_time)
  • The offline processing job is run daily (schedule_interval)
  • The feature values are served for 24 hours (serving_ttl)
from tecton import pyspark_transformation, TemporalFeaturePackage, MaterializationConfig
from feature_repo.shared import entities, data_sources
from datetime import datetime

@sql_transformation(inputs=data_sources.ad_impressions_batch, has_context=True)
def user_partner_impression_count_7_days_transformer(context, ad_impressions_batch):
    return f"""
    SELECT
        user_uuid,
                partner_id,
        count(*) as user_partner_impressions_7_days,
        to_timestamp('{context.feature_data_end_time}') as timestamp
    FROM
        {ad_impressions_batch}
    GROUP BY
        user_uuid, partner_id
    """

user_partner_impression_count_7_days = TemporalFeaturePackage(
    name="user_partner_impression_count_7_days",
    description="[SQL Feature] The number of ads a user has been shown on a given partner site over the past 7 days",
    transformation=user_partner_impression_count_7_days_transformer,
    entities=[entities.user_entity, entities.partner_entity],
    materialization=MaterializationConfig(
        offline_enabled=True,
        online_enabled=True,
        feature_start_time=datetime(year=2020, month=6, day=20),
        serving_ttl="1d",
        schedule_interval="1d",
        data_lookback_period="7d"
    ),
    family='ad_serving',
    tags={'release': 'production'},
    owner="jay@tecton.ai"
)

Interacting with Feature Packages

Once you have defined a Feature Package, register the FeaturePackage object with the Feature Store. Tecton then uses the FeaturePackage to manage the transforming, storing, and serving of the feature.

Tecton also provides a set of methods that enable you to access the features in a notebook or REPL for development. Following are examples of some common actions.

Loading a Feature Package

fp = tecton.get_feature_package("user_partner_impression_count_7_days")

Getting a Summary of the Feature Package Metadata

fp.summary()

Ingesting a Dataframe in your Notebook

The .ingest() method works only with a PushFeaturePackage object.

# df is a Pandas or Spark dataframe
fp.ingest(df)

Previewing Feature Data from a Feature Package

This method retrieves a sample of data. Use it to gain an understanding of the data structure and whether the information might be useful in a new model.

fp.preview()

Getting Feature Data for Specific Events

The .get_feature_dataframe() method retrieves specific data from a Feature Package (for example, for specific keys and timestamps). Construct a table for the requested information and pass that table to the Feature Package.

# Dataframe that includes the entities and timestamp for the request
# For this FP, the entities are user_uuid and partner_id
events = spark.read.parquet("dbfs:/event_data.pq")

fp.get_feature_dataframe(events)