Skip to content

Creating a SQL or PySpark Feature


In this example, we create a simple feature using a TemporalFeaturePackage. This example does not contain code examples for other Feature Package types, such as TemporalAggregateFeaturePackages or OnlineFeaturePackages, but the workflow for creating these features is very similar.

The process of creating a Feature Package comprises these steps:

  1. Defining a Transformation
  2. Defining a Feature Package
  3. Applying the Feature Package
  4. Viewing and previewing the Feature Package

The Feature Package in the code below is based on a single data source and one SQL transformation.

Defining a Transformation

Create a new file in your feature repository and add the following code:

from tecton import sql_transformation, TemporalFeaturePackage, MaterializationConfig

@sql_transformation(inputs=data_sources.ad_impressions_batch, has_context=True)
def partner_ctr_performance_transformer(context, ad_impressions_batch):
    return f"""
        sum(clicked) / count(*) as partner_total_ctr,
        to_timestamp('{context.feature_data_end_time}') as timestamp

A Transformation is a primitive in Tecton that builds features from raw data. In this example, we are defining a SQL Transformation, which runs a SQL SELECT statement on incoming data. To learn more about how transformations are used, see the Transformations overview or reference documentation.

Defining a Feature Package

Once the Transformation has been defined for a feature, the next step is to use it in a Feature Package. Feature Packages take a number of parameters which serve to manage the feature, including:

  • Metadata about the feature(s), which Tecton uses for organization
  • References to Transformations (as defined above) and Entities, which describe the logic used to generate feature values from raw data
  • Materialization settings that describe how and when Tecton should compute feature values (if at all)

For this example feature, the Materialization settings are set to run as follows:

  • Feature values are being stored for training (offline_enabled), but not serving (online_enabled)
  • The stored training data begins on June 20th, 2020 (feature_start_time)
  • The processing job is run daily (schedule_interval); feature values are served for 24h as well (serving_ttl)
  • The data_lookback_period is set to "7d". This parameter works with the context value in the Transformation. It sets the context.feature_data_start_time to be 7 days earlier than the end time. As a result, the feature values are calculated for a span of one week.

Add the following code to your file:

partner_ctr_performance_7d = TemporalFeaturePackage(
    description="[SQL Feature] The aggregate CTR of a partner website (clicks / total impressions) over the past 7 days",
        feature_start_time=datetime(year=2020, month=6, day=20),
    tags={'release': 'development'},

Applying the Feature Package

Up until this point, you have written a feature definition in your local repository. In order to use it in Tecton, you must register it using the Tecton CLI.

To register the feature, run the Tecton CLI command tecton apply:

$ tecton apply
Using workspace "prod"
✅ Imported 15 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side validation of feature declarations
 ↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  + Create Transformation
        name: partner_ctr_performance_transformer

  + Create FeaturePackage
        name: partner_ctr_performance:7d
        transformation: partner_ctr_performance_transformer

 ↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]>

Enter y to apply the FeaturePackage to Tecton.

Once you apply the new Feature Package, Tecton will begin orchestrating and managing the feature within the Feature Store. The feature will be available for access in a Spark notebook for experimentation and training, as well as in production for serving.

Viewing and previewing a Feature

Once a feature has been registered with Tecton, it can be viewed within the Tecton UI. This can be used to view the transformation logic, feature lineage, and the health of the processing jobs for generating feature data.

Viewing Features in the Web UI

Once a feature has been registered, it can also be loaded into a Spark notebook. This can be used to preview data, perform additional exploratory analysis, and build training sets.

Features in a Notebook