Skip to main content
Version: Beta 🚧

Feature Development Workflow

In Tecton, features are developed and tested in a notebook and then productionized as code within a Tecton feature repository (and optionally, a GitOps workflow to enable an integrated CI/CD workflow).

This gives the benefit of fast iteration speed in a notebook, while preserving DevOps best practices for productionization like version control, code reviews, and CI/CD.

A typical development workflow for building a feature and testing it in a training data set looks like this:

  1. Create a new feature definition in a notebook
  2. Run the feature pipeline interactively to ensure correct feature data
  3. Fetch a set of registered features from a workspace and create a new feature set
  4. Generate training data to test the new feature in a model
  5. Copy the new feature definition into your feature repo
  6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline
note

If you do not need to test the feature in a model, then you would skip steps 3 and 4 above.

This page will walk through these steps in detail.

If you have not already done so, install the Tecton CLI on your local machine and in your notebook environment. Also ensure that your notebook has access to the relevant compute for your Feature Views.

1. Create a local feature definition in a notebook​

Any Tecton object can be defined in a notebook. We call these definitions "local objects". Simply write the definition in a notebook cell.

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Attribute
from tecton.types import Field, String, Timestamp
from datetime import datetime, timedelta

user = Entity(name="user", join_keys=[Field("user_id", String)])

user_sign_ups = BatchSource(
name="user_sign_ups",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/fraud_demo/customers/data.pq",
file_format="parquet",
timestamp_field="signup_timestamp",
),
)


@batch_feature_view(
sources=[user_sign_ups],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
timestamp_field="signup_timestamp",
features=[
Attribute("credit_card_issuer", String),
],
)
def user_credit_card_issuer(user_sign_ups):
user_sign_ups["credit_card_issuer"] = user_sign_ups["cc_num"].apply(
lambda x: "AmEx"
if str(x)[0] == "3"
else "Visa"
if str(x)[0] == "4"
else "MasterCard"
if str(x)[0] == "5"
else "Discover"
if str(x)[0] == "6"
else "other"
)
return user_sign_ups[["user_id", "signup_timestamp", "credit_card_issuer"]]

Depending on registered objects​

Your team's workspace(s) may include registered (a.k.a. applied) data sources, entities, or other objects that you want to build off of. During development, local objects (in your notebook) can depend on registered objects fetched from your workspace.

For example:

from tecton import Entity, BatchSource, FileConfig, batch_feature_view
from tecton.types import String
from datetime import datetime, timedelta

# Fetch the workspace
ws = tecton.get_workspace("prod")

# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")

# Use those objects as dependencies
@batch_feature_view(
sources=[user_sign_ups],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
timestamp_field="signup_timestamp",
features=[Attribute("credit_card_issuer", String)],
)
def user_credit_card_issuer(user_sign_ups):
user_sign_ups["credit_card_issuer"] = user_sign_ups["cc_num"].apply(
lambda x: "AmEx"
if str(x)[0] == "3"
else "Visa"
if str(x)[0] == "4"
else "MasterCard"
if str(x)[0] == "5"
else "Discover"
if str(x)[0] == "6"
else "other"
)
return user_sign_ups[["user_id", "signup_timestamp", "credit_card_issuer"]]

2. Test objects interactively​

Interactive methods can be called on objects to test their output or get additional information. Refer to the SDK Reference for available methods on Tecton objects.

start = datetime(2017, 1, 1)
end = datetime(2022, 1, 1)

# Get a range of historical feature data
df = user_credit_card_issuer.get_features_in_range(start_time=start, end_time=end)

display(df.to_pandas())
user_idcredit_card_issuer_valid_from_valid_to
0user_91355675520Visa2017-01-01 00:00:002018-04-07 00:00:00
1user_26990816968Discover2017-01-01 00:00:002021-05-09 00:00:00
2user_950482239421other2017-01-01 00:00:002021-02-12 00:00:00
3user_600003278485AmEx2017-01-01 00:00:002021-08-04 00:00:00
4user_200441593087Visa2017-01-01 00:00:002021-04-09 00:00:00
5user_699955105085Visa2017-01-01 00:00:002021-09-09 00:00:00

3. Fetch a set of registered features from a workspace and create a new feature set​

After creating a new feature you may want to test it in a new feature set for a model. You can do this by creating a local Feature Service object. As needed, additional features can be fetched from a workspace and added to the new Feature Service.

Commonly you may want to fetch a feature set from an existing Feature Service and add your new feature to it. You can get the list of features in a Feature Service by calling .features on it and then include that list in a new local Feature Service.

from tecton import FeatureService

ws = tecton.get_workspace("prod")
features_list = ws.get_feature_service("fraud_detection").features

fraud_detection_v2 = FeatureService(name="fraud_detection_v2", features=features_list + [user_credit_card_issuer])
note

Tecton objects are immutable and therefore a new local Feature Service is created.

4. Generate training data to test the new feature in a model​

Training data can be generated for a list of training events by calling get_features_for_events(events=training_events) on a Feature Service. Tecton will join in the historically accurate value of each feature for each event in the provided events dataframe.

Feature values will be fetched from the Offline Store if they have been materialized offline and computed on the fly if not.

note

This section depends on the following packages for reading data from S3 into a Pandas Dataframe.

pip install s3fs fsspec

import pandas as pd

training_events = pd.read_parquet(
"s3://tecton.ai.public/tutorials/fraud_demo/transactions/data.pq", storage_options={"anon": True}
)
training_data = fraud_detection_v2.get_features_for_events(events=training_events)

display(training_data.to_pandas())
note

If you encounter the error Binder Error: Referenced column "__index_level_0__" not found in FROM clause! while using pandas.read_parquet, it may be due to an unexpected index column. To resolve this, drop the index before loading the DataFrame. Here's an example:

# Reset index and drop any default index column from the DataFrame
df[['user_id', 'event', 'timestamp']].reset_index(drop=True)

# Load the parquet file without setting an index column
df = pd.read_parquet('example.parquet', index_col=None)

user_idtimestampmerchantamtis_frauduser_transaction_amount_averages__amt_mean_1d_1duser_transaction_amount_averages__amt_mean_3d_1duser_transaction_amount_averages__amt_mean_7d_1duser_credit_card_issuer__credit_card_issuertransaction_amount_is_higher_than_average__transaction_amount_is_higher_than_average
0user_1313404710602021-01-01 10:44:12Spencer-Runolfsson332.610nannannanVisaTrue
1user_1313404710602021-01-04 22:48:21Schroeder, Hauck and Treutel105.331nan332.61332.61VisaFalse
2user_1313404710602021-01-05 15:14:06O'Reilly, Mohr and Purdy15.390105.33105.33218.97VisaFalse
3user_1313404710602021-01-06 02:51:49Donnelly PLC66.07015.3960.36151.11VisaFalse
4user_1313404710602021-01-07 00:59:43Huel Ltd113.63066.0762.2633129.85VisaFalse

5. Copy definitions into your team's Feature Repository​

Objects and helper functions can be copied directly into a Feature Repository for productionization. References to remote workspace objects should be changed to local definitions in the repo.

features/user_credit_card_issuer.py

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Attribute
from tecton.types import String
from datetime import datetime, timedelta

# Change to local object references
from entities import user
from data_sources import user_sign_ups

# Set materialization parameters to materialize this feature online and offline
@batch_feature_view(
sources=[user_sign_ups],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
online=True,
offline=True,
feature_start_time=datetime(2017, 1, 1),
timestamp_field="signup_timestamp",
features=[
Attribute("credit_card_issuer", String),
],
)
def user_credit_card_issuer(user_sign_ups):
user_sign_ups["credit_card_issuer"] = user_sign_ups["cc_num"].apply(
lambda x: "AmEx"
if str(x)[0] == "3"
else "Visa"
if str(x)[0] == "4"
else "MasterCard"
if str(x)[0] == "5"
else "Discover"
if str(x)[0] == "6"
else "other"
)
return user_sign_ups[["user_id", "signup_timestamp", "credit_card_issuer"]]

feature_services/fraud_detection.py

from tecton import FeatureService
from features.user_transaction_amount_averages import user_transaction_amount_averages
from features.transaction_amount_is_higher_than_average import transaction_amount_is_higher_than_average
from features.user_credit_card_issuer import user_credit_card_issuer

fraud_detection = FeatureService(
name="fraud_detection", features=[user_transaction_amount_averages, transaction_amount_is_higher_than_average]
)

# Add the new Feature Service and change the feature list to local feature references
fraud_detection_v2 = FeatureService(
name="fraud_detection:v2",
features=[transaction_amount_is_higher_than_average, user_transaction_amount_metrics, user_credit_card_issuer],
)

6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline​

Feature repositories get "applied" to workspaces using the Tecton CLI in order to register definitions and productionize feature pipelines.

To apply changes, follow these steps in your terminal:

  1. Log into your organization's Tecton account using tecton login my-org.tecton.ai
  2. Select the workspace you want to apply your changes to using tecton workspace select [workspace-name]
  3. Run tecton plan to check the changes that would be applied to the workspace.
  4. Run tecton apply to apply your changes.
tip

Many organizations integrate the "tecton apply" step into their their CI/CD pipelines. This means that rather than using tecton apply directly in the CLI, you may simply create a git PR for your changes.

Development Workspaces​

If you want to save your changes in Tecton without spinning up production services you can apply your repo to a "development" workspace.

Development workspaces incur no costs, since they do not materialize any data and do not consume compute or serving resources. They can primarily be used to visualize feature pipelines and share your work.

To create a development workspace, run tecton workspace create [my-workspace] in your CLI.

Was this page helpful?