Creating a SQL or PySpark Feature
Overview
In this example, we create a simple feature using a TemporalFeaturePackage
. This example does not contain code examples for other Feature Package types, such as TemporalAggregateFeaturePackage
s or OnlineFeaturePackage
s, but the workflow for creating these features is very similar.
The process of creating a Feature Package comprises these steps:
- Defining a Transformation
- Defining a Feature Package
- Applying the Feature Package
- Viewing and previewing the Feature Package
The Feature Package in the code below is based on a single data source and one SQL transformation.
Defining a Transformation
Create a new file in your feature repository and add the following code:
from tecton import sql_transformation, TemporalFeaturePackage, MaterializationConfig
@sql_transformation(inputs=data_sources.ad_impressions_batch, has_context=True)
def partner_ctr_performance_transformer(context, ad_impressions_batch):
return f"""
SELECT
partner_id,
sum(clicked) / count(*) as partner_total_ctr,
to_timestamp('{context.feature_data_end_time}') as timestamp
FROM
{ad_impressions_batch}
GROUP BY
partner_id
"""
A Transformation is a primitive in Tecton that builds features from raw data. In this example, we are defining a SQL Transformation, which runs a SQL SELECT
statement on incoming data. To learn more about how transformations are used, see the Transformations overview or reference documentation.
Defining a Feature Package
Once the Transformation has been defined for a feature, the next step is to use it in a Feature Package. Feature Packages take a number of parameters which serve to manage the feature, including:
- Metadata about the feature(s), which Tecton uses for organization
- References to Transformations (as defined above) and Entities, which describe the logic used to generate feature values from raw data
- Materialization settings that describe how and when Tecton should compute feature values (if at all)
For this example feature, the Materialization settings are set to run as follows:
- Feature values are being stored for training (
offline_enabled
), but not serving (online_enabled
) - The stored training data begins on June 20th, 2020 (
feature_start_time
) - The processing job is run daily (
schedule_interval
); feature values are served for 24h as well (serving_ttl
) - The
data_lookback_period
is set to"7d"
. This parameter works with the context value in the Transformation. It sets thecontext.feature_data_start_time
to be 7 days earlier than the end time. As a result, the feature values are calculated for a span of one week.
Add the following code to your file:
partner_ctr_performance_7d = TemporalFeaturePackage(
name="partner_ctr_performance:7d",
description="[SQL Feature] The aggregate CTR of a partner website (clicks / total impressions) over the past 7 days",
transformation=partner_ctr_performance_transformer,
entities=[e.partner_entity],
materialization=MaterializationConfig(
offline_enabled=True,
online_enabled=False,
feature_start_time=datetime(year=2020, month=6, day=20),
serving_ttl="1d",
schedule_interval="1d",
data_lookback_period="7d"
),
family='ad_serving',
tags={'release': 'development'},
owner="ravi@tecton.ai",
)
Applying the Feature Package
Up until this point, you have written a feature definition in your local repository. In order to use it in Tecton, you must register it using the Tecton CLI.
To register the feature, run the Tecton CLI command tecton apply
:
$ tecton apply
Using workspace "prod"
✅ Imported 15 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side validation of feature declarations
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
+ Create Transformation
name: partner_ctr_performance_transformer
+ Create FeaturePackage
name: partner_ctr_performance:7d
transformation: partner_ctr_performance_transformer
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]>
Enter y
to apply the FeaturePackage to Tecton.
Once you apply the new Feature Package, Tecton will begin orchestrating and managing the feature within the Feature Store. The feature will be available for access in a Spark notebook for experimentation and training, as well as in production for serving.
Viewing and previewing a Feature
Once a feature has been registered with Tecton, it can be viewed within the Tecton UI. This can be used to view the transformation logic, feature lineage, and the health of the processing jobs for generating feature data.
Once a feature has been registered, it can also be loaded into a Spark notebook. This can be used to preview data, perform additional exploratory analysis, and build training sets.