Version: 0.7

Feature Development Workflow

In Tecton, features are developed and tested in a notebook and then productionized as code within a Tecton feature repository (and optionally, a GitOps workflow to enable an integrated CI/CD workflow).

This gives the benefit of fast iteration speed in a notebook, while preserving DevOps best practices for productionization like version control, code reviews, and CI/CD.

A typical development workflow for building a feature and testing it in a training data set looks like this:

Create and validate a new feature definition in a notebook
Run the feature pipeline interactively to ensure correct feature data
Fetch a set of registered features from a workspace and create a new feature set
Generate training data to test the new feature in a model
Copy the new feature definition into your feature repo
Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline

note

If you do not need to test the feature in a model, then you would skip steps 3 and 4 above.

This page will walk through these steps in detail.

If you have not already done so, install the Tecton CLI on your local machine and in your notebook environment. Also ensure that your notebook has access to the relevant compute for your Feature Views.

1. Create and validate a local feature definition in a notebook

Any Tecton object can be defined and validated in a notebook. We call these definitions "local objects".

Simply write the definition in a notebook cell and call .validate() on the object. Tecton will ensure the definition is correct and run automatic schema validations on feature pipelines.

Spark
Snowflake

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta

user = Entity(name="user", join_keys=["user_id"])

user_sign_ups = BatchSource(
    name="user_sign_ups",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/fraud_demo/customers/data.pq",
        file_format="parquet",
        timestamp_field="signup_timestamp",
    ),
)


@batch_feature_view(
    sources=[FilteredSource(user_sign_ups)],
    entities=[user],
    mode="spark_sql",
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
    return f"""
        SELECT
            user_id,
            signup_timestamp,
            CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
                WHEN '3' THEN 'AmEx'
                WHEN '4' THEN 'Visa'
                WHEN '5' THEN 'MasterCard'
                WHEN '6' THEN 'Discover'
                ELSE 'other'
            END as credit_card_issuer
        FROM
            {user_sign_ups}
        """

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta

user = Entity(name="user", join_keys=["user_id"])

user_sign_ups = BatchSource(
    name="user_sign_ups",
    batch_config=SnowflakeConfig(
        database="demo", schema="users", table="user_sign_ups", timestamp_field="signup_timestamp"
    ),
)


@batch_feature_view(
    sources=[FilteredSource(user_sign_ups)],
    entities=[user],
    mode="snowflake_sql",
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
    return f"""
        SELECT
            user_id,
            signup_timestamp,
            CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
                WHEN '3' THEN 'AmEx'
                WHEN '4' THEN 'Visa'
                WHEN '5' THEN 'MasterCard'
                WHEN '6' THEN 'Discover'
                ELSE 'other'
            END as credit_card_issuer
        FROM
            {user_sign_ups}
        """

Tecton objects must first be validated before they can be queried interactively. You can either explicitly validate objects with .validate() or call tecton.set_validation_mode('auto') once in your notebook for automatic lazy validations at the time of usage.

user_credit_card_issuer.validate()  # or call tecton.set_validation_mode('auto')

Depending on registered objects

Your team's workspace(s) may include registered (a.k.a. applied) data sources, entities, or other objects that you want to build off of. During development, local objects (in your notebook) can depend on registered objects fetched from your workspace.

For example:

Spark
Snowflake

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta

# Fetch the workspace
ws = tecton.get_workspace("prod")

# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")

# Use those objects as dependencies
@batch_feature_view(
    sources=[FilteredSource(user_sign_ups)],
    entities=[user],
    mode="spark_sql",
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
    return f"""
        SELECT
            user_id,
            signup_timestamp,
            CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
                WHEN '3' THEN 'AmEx'
                WHEN '4' THEN 'Visa'
                WHEN '5' THEN 'MasterCard'
                WHEN '6' THEN 'Discover'
                ELSE 'other'
            END as credit_card_issuer
        FROM
            {user_sign_ups}
        """

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta

# Fetch the workspace
ws = tecton.get_workspace("prod")

# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")

# Use those objects as dependencies
@batch_feature_view(
    sources=[FilteredSource(user_sign_ups)],
    entities=[user],
    mode="snowflake_sql",
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
    return f"""
        SELECT
            user_id,
            signup_timestamp,
            CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
                WHEN '3' THEN 'AmEx'
                WHEN '4' THEN 'Visa'
                WHEN '5' THEN 'MasterCard'
                WHEN '6' THEN 'Discover'
                ELSE 'other'
            END as credit_card_issuer
        FROM
            {user_sign_ups}
        """

2. Test objects interactively

Interactive methods can be called on objects to test their output or get additional information. Refer to the SDK Reference for available methods on Tecton objects.

start = datetime(2017, 1, 1)
end = datetime(2022, 1, 1)

# Get a range of historical feature data
df = user_credit_card_issuer.get_historical_features(start_time=start, end_time=end)

display(df.to_pandas())

	user_id	signup_timestamp	credit_card_issuer	_effective_timestamp
0	user_709462196403	2017-04-06 00:50:31	Visa	2017-04-07 00:00:00
1	user_687958452057	2017-05-08 16:07:51	Discover	2017-05-09 00:00:00
2	user_884240387242	2017-06-15 19:33:18	other	2017-06-16 00:00:00
3	user_205125746682	2017-09-03 03:42:14	AmEx	2017-09-04 00:00:00
4	user_950482239421	2017-09-08 19:26:25	Visa	2017-09-09 00:00:00

3. Fetch a set of registered features from a workspace and create a new feature set

After creating a new feature you may want to test it in a new feature set for a model. You can do this by creating a local Feature Service object. As needed, additional features can be fetched from a workspace and added to the new Feature Service.

Commonly you may want to fetch a feature set from an existing Feature Service and add your new feature to it. You can get the list of features in a Feature Service by calling .features on it and then include that list in a new local Feature Service.

from tecton import FeatureService

ws = tecton.get_workspace("prod")
features_list = ws.get_feature_service("fraud_detection").features

fraud_detection_v2 = FeatureService(name="fraud_detection_v2", features=features_list + [user_credit_card_issuer])
fraud_detection_v2.validate()

note

Tecton objects are immutable and therefore a new local Feature Service is created.

4. Generate training data to test the new feature in a model

Training data can be generated for a list of training events by calling get_historical_features(spine=training_events) on a Feature Service. Tecton will join in the historically accurate value of each feature for each event in the provided spine.

Feature values will be fetched from the Offline Store if they have been materialized offline and computed on the fly if not.

training_events = spark.read.parquet("s3://tecton.ai.public/tutorials/fraud_demo/transactions/")
training_data = fraud_detection_v2.get_historical_features(spine=training_events)

display(training_data.to_pandas())

	user_id	timestamp	merchant	amt	is_fraud	user_transaction_amount_averages__amt_mean_1d_1d	user_transaction_amount_averages__amt_mean_3d_1d	user_transaction_amount_averages__amt_mean_7d_1d	user_credit_card_issuer__credit_card_issuer	transaction_amount_is_higher_than_average__transaction_amount_is_higher_than_average
0	user_131340471060	2021-01-01 10:44:12	Spencer-Runolfsson	332.61	0	nan	nan	nan	Visa	True
1	user_131340471060	2021-01-04 22:48:21	Schroeder, Hauck and Treutel	105.33	1	nan	332.61	332.61	Visa	False
2	user_131340471060	2021-01-05 15:14:06	O'Reilly, Mohr and Purdy	15.39	0	105.33	105.33	218.97	Visa	False
3	user_131340471060	2021-01-06 02:51:49	Donnelly PLC	66.07	0	15.39	60.36	151.11	Visa	False
4	user_131340471060	2021-01-07 00:59:43	Huel Ltd	113.63	0	66.07	62.2633	129.85	Visa	False

5. Copy definitions into your team's Feature Repository

Objects and helper functions can be copied directly into a Feature Repository for productionisation. References to remote workspace objects should be changed to local definitions in the repo.

note

You do not need to call .validate() in a Feature Repo. Validation will be run on all objects during tecton apply

features/user_credit_card_issuer.py

from tecton import Entity, BatchSource, FileConfig, batch_feature_view
from datetime import datetime, timedelta

# Change to local object references
from entities import user
from data_sources import user_sign_ups

# Set materialization parameters to materialize this feature online and offline
@batch_feature_view(
    sources=[user_sign_ups],
    entities=[user],
    mode="spark_sql",
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=3650),
    online=True,
    offline=True,
    feature_start_time=datetime(2017, 1, 1),
)
def user_credit_card_issuer(user_sign_ups):
    return f"""
        SELECT
            user_id,
            signup_timestamp,
            CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
                WHEN '3' THEN 'AmEx'
                WHEN '4' THEN 'Visa'
                WHEN '5' THEN 'MasterCard'
                WHEN '6' THEN 'Discover'
                ELSE 'other'
            END as credit_card_issuer
        FROM
            {user_sign_ups}
        """

feature_services/fraud_detection.py

from tecton import FeatureService
from features.user_transaction_amount_averages import user_transaction_amount_averages
from features.transaction_amount_is_higher_than_average import transaction_amount_is_higher_than_average
from features.user_credit_card_issuer import user_credit_card_issuer

fraud_detection = FeatureService(
    name="fraud_detection", features=[user_transaction_amount_averages, transaction_amount_is_higher_than_average]
)

# Add the new Feature Service and change the feature list to local feature references
fraud_detection_v2 = FeatureService(
    name="fraud_detection:v2",
    features=[transaction_amount_is_higher_than_average, user_transaction_amount_metrics, user_credit_card_issuer],
)

6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline

Feature repositories get "applied" to workspaces using the Tecton CLI in order to register definitions and productionize feature pipelines.

To apply changes, follow these steps in your terminal:

Log into your organization's Tecton account using tecton login my-org.tecton.ai
Select the workspace you want to apply your changes to using tecton workspace select [workspace-name]
Run tecton plan to check the changes that would be applied to the workspace.
Run tecton apply to apply your changes.

tip

Many organizations integrate the "tecton apply" step into their their CI/CD pipelines. This means that rather than using tecton apply directly in the CLI, you may simply create a git PR for your changes.

Development Workspaces

If you want to save your changes in Tecton without spinning up production services you can apply your repo to a "development" workspace.

Development workspaces incur no costs, since they do not materialize any data and do not consume compute or serving resources. They can primarily be used to visualize feature pipelines and share your work.

To create a development workspace, run tecton workspace create [my-workspace] in your CLI.

1. Create and validate a local feature definition in a notebook​

Depending on registered objects​

2. Test objects interactively​

3. Fetch a set of registered features from a workspace and create a new feature set​

4. Generate training data to test the new feature in a model​

5. Copy definitions into your team's Feature Repository​

6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline​

Development Workspaces​

Was this page helpful?

1. Create and validate a local feature definition in a notebook

Depending on registered objects

2. Test objects interactively

3. Fetch a set of registered features from a workspace and create a new feature set

4. Generate training data to test the new feature in a model

5. Copy definitions into your team's Feature Repository

6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline

Development Workspaces