Feature Development Workflow
In Tecton, features are developed and tested in a notebook and then productionized as code within a Tecton feature repository (and optionally, a GitOps workflow to enable an integrated CI/CD workflow).
This gives the benefit of fast iteration speed in a notebook, while preserving DevOps best practices for productionization like version control, code reviews, and CI/CD.
A typical development workflow for building a feature and testing it in a training data set looks like this:
- Create and validate a new feature definition in a notebook
- Run the feature pipeline interactively to ensure correct feature data
- Fetch a set of registered features from a workspace and create a new feature set
- Generate training data to test the new feature in a model
- Copy the new feature definition into your feature repo
- Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline
If you do not need to test the feature in a model, then you would skip steps 3 and 4 above.
This page will walk through these steps in detail.
If you have not already done so, install the Tecton CLI on your local machine and in your notebook environment. Also ensure that your notebook has access to the relevant compute for your Feature Views.
1. Create and validate a local feature definition in a notebook​
Any Tecton object can be defined and validated in a notebook. We call these definitions "local objects".
Simply write the definition in a notebook cell and call .validate()
on the
object. Tecton will ensure the definition is correct and run automatic schema
validations on feature pipelines.
- Spark
- Snowflake
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta
user = Entity(name="user", join_keys=["user_id"])
user_sign_ups = BatchSource(
name="user_sign_ups",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/fraud_demo/customers/data.pq",
file_format="parquet",
timestamp_field="signup_timestamp",
),
)
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta
user = Entity(name="user", join_keys=["user_id"])
user_sign_ups = BatchSource(
name="user_sign_ups",
batch_config=SnowflakeConfig(
database="demo", schema="users", table="user_sign_ups", timestamp_field="signup_timestamp"
),
)
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="snowflake_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""
Tecton objects must first be validated before they can be queried interactively.
You can either explicitly validate objects with .validate()
or call
tecton.set_validation_mode('auto')
once in your notebook for automatic lazy
validations at the time of usage.
user_credit_card_issuer.validate() # or call tecton.set_validation_mode('auto')
Depending on registered objects​
Your team's workspace(s) may include registered (a.k.a. applied) data sources, entities, or other objects that you want to build off of. During development, local objects (in your notebook) can depend on registered objects fetched from your workspace.
For example:
- Spark
- Snowflake
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta
# Fetch the workspace
ws = tecton.get_workspace("prod")
# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")
# Use those objects as dependencies
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta
# Fetch the workspace
ws = tecton.get_workspace("prod")
# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")
# Use those objects as dependencies
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="snowflake_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""
2. Test objects interactively​
Interactive methods can be called on objects to test their output or get additional information. Refer to the SDK Reference for available methods on Tecton objects.
start = datetime(2017, 1, 1)
end = datetime(2022, 1, 1)
# Get a range of historical feature data
df = user_credit_card_issuer.get_historical_features(start_time=start, end_time=end)
display(df.to_pandas())
user_id | signup_timestamp | credit_card_issuer | _effective_timestamp | |
---|---|---|---|---|
0 | user_709462196403 | 2017-04-06 00:50:31 | Visa | 2017-04-07 00:00:00 |
1 | user_687958452057 | 2017-05-08 16:07:51 | Discover | 2017-05-09 00:00:00 |
2 | user_884240387242 | 2017-06-15 19:33:18 | other | 2017-06-16 00:00:00 |
3 | user_205125746682 | 2017-09-03 03:42:14 | AmEx | 2017-09-04 00:00:00 |
4 | user_950482239421 | 2017-09-08 19:26:25 | Visa | 2017-09-09 00:00:00 |
3. Fetch a set of registered features from a workspace and create a new feature set​
After creating a new feature you may want to test it in a new feature set for a model. You can do this by creating a local Feature Service object. As needed, additional features can be fetched from a workspace and added to the new Feature Service.
Commonly you may want to fetch a feature set from an existing Feature Service
and add your new feature to it. You can get the list of features in a Feature
Service by calling .features
on it and then include that list in a new local
Feature Service.
from tecton import FeatureService
ws = tecton.get_workspace("prod")
features_list = ws.get_feature_service("fraud_detection").features
fraud_detection_v2 = FeatureService(name="fraud_detection_v2", features=features_list + [user_credit_card_issuer])
fraud_detection_v2.validate()
Tecton objects are immutable and therefore a new local Feature Service is created.
4. Generate training data to test the new feature in a model​
Training data can be generated for a list of training events by calling
get_historical_features(spine=training_events)
on a Feature Service. Tecton
will join in the historically accurate value of each feature for each event in
the provided spine.
Feature values will be fetched from the Offline Store if they have been materialized offline and computed on the fly if not.
training_events = spark.read.parquet("s3://tecton.ai.public/tutorials/fraud_demo/transactions/")
training_data = fraud_detection_v2.get_historical_features(spine=training_events)
display(training_data.to_pandas())
user_id | timestamp | merchant | amt | is_fraud | user_transaction_amount_averages__amt_mean_1d_1d | user_transaction_amount_averages__amt_mean_3d_1d | user_transaction_amount_averages__amt_mean_7d_1d | user_credit_card_issuer__credit_card_issuer | transaction_amount_is_higher_than_average__transaction_amount_is_higher_than_average | |
---|---|---|---|---|---|---|---|---|---|---|
0 | user_131340471060 | 2021-01-01 10:44:12 | Spencer-Runolfsson | 332.61 | 0 | nan | nan | nan | Visa | True |
1 | user_131340471060 | 2021-01-04 22:48:21 | Schroeder, Hauck and Treutel | 105.33 | 1 | nan | 332.61 | 332.61 | Visa | False |
2 | user_131340471060 | 2021-01-05 15:14:06 | O'Reilly, Mohr and Purdy | 15.39 | 0 | 105.33 | 105.33 | 218.97 | Visa | False |
3 | user_131340471060 | 2021-01-06 02:51:49 | Donnelly PLC | 66.07 | 0 | 15.39 | 60.36 | 151.11 | Visa | False |
4 | user_131340471060 | 2021-01-07 00:59:43 | Huel Ltd | 113.63 | 0 | 66.07 | 62.2633 | 129.85 | Visa | False |
5. Copy definitions into your team's Feature Repository​
Objects and helper functions can be copied directly into a Feature Repository for productionisation. References to remote workspace objects should be changed to local definitions in the repo.
You do not need to call .validate()
in a Feature Repo. Validation will be run
on all objects during tecton apply
features/user_credit_card_issuer.py
from tecton import Entity, BatchSource, FileConfig, batch_feature_view
from datetime import datetime, timedelta
# Change to local object references
from entities import user
from data_sources import user_sign_ups
# Set materialization parameters to materialize this feature online and offline
@batch_feature_view(
sources=[user_sign_ups],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
online=True,
offline=True,
feature_start_time=datetime(2017, 1, 1),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""
feature_services/fraud_detection.py
from tecton import FeatureService
from features.user_transaction_amount_averages import user_transaction_amount_averages
from features.transaction_amount_is_higher_than_average import transaction_amount_is_higher_than_average
from features.user_credit_card_issuer import user_credit_card_issuer
fraud_detection = FeatureService(
name="fraud_detection", features=[user_transaction_amount_averages, transaction_amount_is_higher_than_average]
)
# Add the new Feature Service and change the feature list to local feature references
fraud_detection_v2 = FeatureService(
name="fraud_detection:v2",
features=[transaction_amount_is_higher_than_average, user_transaction_amount_metrics, user_credit_card_issuer],
)
6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline​
Feature repositories get "applied" to workspaces using the Tecton CLI in order to register definitions and productionize feature pipelines.
To apply changes, follow these steps in your terminal:
- Log into your organization's Tecton account using
tecton login my-org.tecton.ai
- Select the workspace you want to apply your changes to using
tecton workspace select [workspace-name]
- Run
tecton plan
to check the changes that would be applied to the workspace. - Run
tecton apply
to apply your changes.
Many organizations integrate the "tecton apply" step into their their CI/CD
pipelines. This means that rather than using tecton apply
directly in the CLI,
you may simply create a git PR for your changes.
Development Workspaces​
If you want to save your changes in Tecton without spinning up production services you can apply your repo to a "development" workspace.
Development workspaces incur no costs, since they do not materialize any data and do not consume compute or serving resources. They can primarily be used to visualize feature pipelines and share your work.
To create a development workspace, run tecton workspace create [my-workspace]
in your CLI.