Version: 0.9

Feature Development Workflow

In Tecton, features are developed and tested in a notebook and then productionized as code within a Tecton feature repository (and optionally, a GitOps workflow to enable an integrated CI/CD workflow).

This gives the benefit of fast iteration speed in a notebook, while preserving DevOps best practices for productionization like version control, code reviews, and CI/CD.

A typical development workflow for building a feature and testing it in a training data set looks like this:

Create and validate a new feature definition in a notebook
Run the feature pipeline interactively to ensure correct feature data
Fetch a set of registered features from a workspace and create a new feature set
Generate training data to test the new feature in a model
Copy the new feature definition into your feature repo
Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline

note

If you do not need to test the feature in a model, then you would skip steps 3 and 4 above.

This page will walk through these steps in detail.

If you have not already done so, install the Tecton CLI on your local machine and in your notebook environment. Also ensure that your notebook has access to the relevant compute for your Feature Views.

1. Create and validate a local feature definition in a notebook

Any Tecton object can be defined and validated in a notebook. We call these definitions "local objects".

Simply write the definition in a notebook cell and call .validate() on the object. Tecton will ensure the definition is correct and run automatic schema validations on feature pipelines.

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from tecton.types import Field, String, Timestamp
from datetime import datetime, timedelta

user = Entity(name="user", join_keys=["user_id"])

user_sign_ups = BatchSource(
    name="user_sign_ups",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/fraud_demo/customers/data.pq",
        file_format="parquet",
        timestamp_field="signup_timestamp",
    ),
)


@batch_feature_view(
    sources=[FilteredSource(user_sign_ups)],
    entities=[user],
    mode="pandas",
    batch_schedule=timedelta(days=1),
    schema=[Field("user_id", String), Field("signup_timestamp", Timestamp), Field("credit_card_issuer", String)],
)
def user_credit_card_issuer(user_sign_ups):
    user_sign_ups["credit_card_issuer"] = user_sign_ups["cc_num"].apply(
        lambda x: "AmEx"
        if str(x)[0] == "3"
        else "Visa"
        if str(x)[0] == "4"
        else "MasterCard"
        if str(x)[0] == "5"
        else "Discover"
        if str(x)[0] == "6"
        else "other"
    )
    return user_sign_ups[["user_id", "signup_timestamp", "credit_card_issuer"]]

Tecton objects must first be validated before they can be queried interactively. You can either explicitly validate objects with .validate() or call tecton.set_validation_mode('auto') once in your notebook for automatic lazy validations at the time of usage.

user_credit_card_issuer.validate()  # or call tecton.set_validation_mode('auto')

Depending on registered objects

Your team's workspace(s) may include registered (a.k.a. applied) data sources, entities, or other objects that you want to build off of. During development, local objects (in your notebook) can depend on registered objects fetched from your workspace.

For example:

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from tecton.types import Field, String, Timestamp
from datetime import datetime, timedelta

# Fetch the workspace
ws = tecton.get_workspace("prod")

# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")

# Use those objects as dependencies
@batch_feature_view(
    sources=[FilteredSource(user_sign_ups)],
    entities=[user],
    mode="pandas",
    batch_schedule=timedelta(days=1),
    schema=[Field("user_id", String), Field("signup_timestamp", Timestamp), Field("credit_card_issuer", String)],
)
def user_credit_card_issuer(user_sign_ups):
    user_sign_ups["credit_card_issuer"] = user_sign_ups["cc_num"].apply(
        lambda x: "AmEx"
        if str(x)[0] == "3"
        else "Visa"
        if str(x)[0] == "4"
        else "MasterCard"
        if str(x)[0] == "5"
        else "Discover"
        if str(x)[0] == "6"
        else "other"
    )
    return user_sign_ups[["user_id", "signup_timestamp", "credit_card_issuer"]]

2. Test objects interactively

Interactive methods can be called on objects to test their output or get additional information. Refer to the SDK Reference for available methods on Tecton objects.

start = datetime(2017, 1, 1)
end = datetime(2022, 1, 1)

# Get a range of historical feature data
df = user_credit_card_issuer.get_features_in_range(start_time=start, end_time=end)

display(df.to_pandas())

	user_id	credit_card_issuer	_valid_from	_valid_to
0	user_91355675520	Visa	2017-01-01 00:00:00	2018-04-07 00:00:00
1	user_26990816968	Discover	2017-01-01 00:00:00	2021-05-09 00:00:00
2	user_950482239421	other	2017-01-01 00:00:00	2021-02-12 00:00:00
3	user_600003278485	AmEx	2017-01-01 00:00:00	2021-08-04 00:00:00
4	user_200441593087	Visa	2017-01-01 00:00:00	2021-04-09 00:00:00
5	user_699955105085	Visa	2017-01-01 00:00:00	2021-09-09 00:00:00

3. Fetch a set of registered features from a workspace and create a new feature set

After creating a new feature you may want to test it in a new feature set for a model. You can do this by creating a local Feature Service object. As needed, additional features can be fetched from a workspace and added to the new Feature Service.

Commonly you may want to fetch a feature set from an existing Feature Service and add your new feature to it. You can get the list of features in a Feature Service by calling .features on it and then include that list in a new local Feature Service.

from tecton import FeatureService

ws = tecton.get_workspace("prod")
features_list = ws.get_feature_service("fraud_detection").features

fraud_detection_v2 = FeatureService(name="fraud_detection_v2", features=features_list + [user_credit_card_issuer])
fraud_detection_v2.validate()

note

Tecton objects are immutable and therefore a new local Feature Service is created.

4. Generate training data to test the new feature in a model

Training data can be generated for a list of training events by calling get_features_for_events(events=training_events) on a Feature Service. Tecton will join in the historically accurate value of each feature for each event in the provided events dataframe.

Feature values will be fetched from the Offline Store if they have been materialized offline and computed on the fly if not.

note

This section depends on the following packages for reading data from S3 into a Pandas Dataframe.

pip install s3fs fsspec

import pandas as pd

training_events = pd.read_parquet(
    "s3://tecton.ai.public/tutorials/fraud_demo/transactions/data.pq", storage_options={"anon": True}
)
training_data = fraud_detection_v2.get_features_for_events(events=training_events)

display(training_data.to_pandas())

note

If you encounter the error Binder Error: Referenced column "__index_level_0__" not found in FROM clause! while using pandas.read_parquet, it may be due to an unexpected index column. To resolve this, drop the index before loading the DataFrame. Here’s an example:

# Reset index and drop any default index column from the DataFrame
df[['user_id', 'event', 'timestamp']].reset_index(drop=True)

# Load the parquet file without setting an index column
df = pd.read_parquet('example.parquet', index_col=None)

	user_id	timestamp	merchant	amt	is_fraud	user_transaction_amount_averages__amt_mean_1d_1d	user_transaction_amount_averages__amt_mean_3d_1d	user_transaction_amount_averages__amt_mean_7d_1d	user_credit_card_issuer__credit_card_issuer	transaction_amount_is_higher_than_average__transaction_amount_is_higher_than_average
0	user_131340471060	2021-01-01 10:44:12	Spencer-Runolfsson	332.61	0	nan	nan	nan	Visa	True
1	user_131340471060	2021-01-04 22:48:21	Schroeder, Hauck and Treutel	105.33	1	nan	332.61	332.61	Visa	False
2	user_131340471060	2021-01-05 15:14:06	O'Reilly, Mohr and Purdy	15.39	0	105.33	105.33	218.97	Visa	False
3	user_131340471060	2021-01-06 02:51:49	Donnelly PLC	66.07	0	15.39	60.36	151.11	Visa	False
4	user_131340471060	2021-01-07 00:59:43	Huel Ltd	113.63	0	66.07	62.2633	129.85	Visa	False

5. Copy definitions into your team's Feature Repository

Objects and helper functions can be copied directly into a Feature Repository for productionisation. References to remote workspace objects should be changed to local definitions in the repo.

note

You do not need to call .validate() in a Feature Repo. Validation will be run on all objects during tecton apply

features/user_credit_card_issuer.py

from tecton import Entity, BatchSource, FileConfig, batch_feature_view
from tecton.types import Field, String, Timestamp
from datetime import datetime, timedelta

# Change to local object references
from entities import user
from data_sources import user_sign_ups

# Set materialization parameters to materialize this feature online and offline
@batch_feature_view(
    sources=[user_sign_ups],
    entities=[user],
    mode="pandas",
    batch_schedule=timedelta(days=1),
    schema=[Field("user_id", String), Field("signup_timestamp", Timestamp), Field("credit_card_issuer", String)],
    online=True,
    offline=True,
    feature_start_time=datetime(2017, 1, 1),
)
def user_credit_card_issuer(user_sign_ups):
    user_sign_ups["credit_card_issuer"] = user_sign_ups["cc_num"].apply(
        lambda x: "AmEx"
        if str(x)[0] == "3"
        else "Visa"
        if str(x)[0] == "4"
        else "MasterCard"
        if str(x)[0] == "5"
        else "Discover"
        if str(x)[0] == "6"
        else "other"
    )
    return user_sign_ups[["user_id", "signup_timestamp", "credit_card_issuer"]]

feature_services/fraud_detection.py

from tecton import FeatureService
from features.user_transaction_amount_averages import user_transaction_amount_averages
from features.transaction_amount_is_higher_than_average import transaction_amount_is_higher_than_average
from features.user_credit_card_issuer import user_credit_card_issuer

fraud_detection = FeatureService(
    name="fraud_detection", features=[user_transaction_amount_averages, transaction_amount_is_higher_than_average]
)

# Add the new Feature Service and change the feature list to local feature references
fraud_detection_v2 = FeatureService(
    name="fraud_detection:v2",
    features=[transaction_amount_is_higher_than_average, user_transaction_amount_metrics, user_credit_card_issuer],
)

6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline

Feature repositories get "applied" to workspaces using the Tecton CLI in order to register definitions and productionize feature pipelines.

To apply changes, follow these steps in your terminal:

Log into your organization's Tecton account using tecton login my-org.tecton.ai
Select the workspace you want to apply your changes to using tecton workspace select [workspace-name]
Run tecton plan to check the changes that would be applied to the workspace.
Run tecton apply to apply your changes.

tip

Many organizations integrate the "tecton apply" step into their their CI/CD pipelines. This means that rather than using tecton apply directly in the CLI, you may simply create a git PR for your changes.

Development Workspaces

If you want to save your changes in Tecton without spinning up production services you can apply your repo to a "development" workspace.

Development workspaces incur no costs, since they do not materialize any data and do not consume compute or serving resources. They can primarily be used to visualize feature pipelines and share your work.

To create a development workspace, run tecton workspace create [my-workspace] in your CLI.

1. Create and validate a local feature definition in a notebook​

Depending on registered objects​

2. Test objects interactively​

3. Fetch a set of registered features from a workspace and create a new feature set​

4. Generate training data to test the new feature in a model​

5. Copy definitions into your team's Feature Repository​

6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline​

Development Workspaces​

Was this page helpful?

1. Create and validate a local feature definition in a notebook

Depending on registered objects

2. Test objects interactively

3. Fetch a set of registered features from a workspace and create a new feature set

4. Generate training data to test the new feature in a model

5. Copy definitions into your team's Feature Repository

6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline

Development Workspaces