Version: 1.1

Building a Production AI Application with Tecton

Not yet a Tecton user?

Need to get a real-time model up and running fast? In this tutorial, we'll build a fraud detection system step-by-step: from data ingestion and feature engineering, all the way to serving predictions in real time.

We'll do it all in Python without needing to assemble or maintain heavy infrastructure. By the end, you’ll have a working production-style pipeline that you can adapt to your own use cases.

What You'll Build

Data Connection: We'll connect to data on S3 to pull historical transaction events.
Feature Development: We'll define and test batch features for fraud detection, right inside a notebook.
Training Data: We'll generate training datasets, ensuring they're point-in-time correct, and train a simple logistic regression model.
Production-Ready Features: We'll productionize our features by materializing them to Tecton's online store.
Real-Time Inference: We'll retrieve features at low latency and run fraud predictions on new transactions.

What You'll Learn

How to build batch features in Tecton
How to test and iterate on feature logic
How to generate training data without data leakage
How to deploy features to Tecton’s online and offline stores
How to serve your model in real-time using Tecton’s low-latency API

Expected time to complete: ~30 minutes.

Part 0: Getting Started

1. Install Requirements

You'll need Python >= 3.8. Install Tecton's SDK and a few other packages:

!pip install 'tecton[rift]==1.1.0' gcsfs s3fs scikit-learn -q

2. Log in to Tecton

If you’re on Tecton’s free Explore tier, you can leave the URL as explore.tecton.ai. Otherwise, replace it with your organization's Tecton URL.

import tecton

tecton.login("explore.tecton.ai")  # replace if needed

When prompted:

Open the link in your browser.
Copy your authentication token back into your notebook.
Press Enter.

3. Set Up Basic Imports

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregate
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta

tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")

That’s it! Your environment is ready. Let's explore our data next.

Part 1: Exploring Data

We have a historical transactions dataset stored in an S3 bucket. Let's take a quick peek.

import pandas as pd

transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})

display(transactions_df.tail(10))

You should see rows that include:

	timestamp	user_id	transaction_id	merchant	merch_lat	merch_long	amount
118280	2025-12-31 18:35:35.552987	user_2417164600	7fc4ead916af497387724f04f03a240a	Summit Auto	89.024620	33.026282	85.45
118281	2025-12-31 19:15:30.052654	user_2898680572	8e76114f89a54b70aee5202d1b7f078e	Denny's	-34.317633	-20.490684	342.00
118282	2025-12-31 19:24:50.740935	user_4133774204	48e6177cd8034b2f9db5d899784708eb	Piazza Auto	-86.847150	-143.865275	814.90
118283	2025-12-31 19:30:19.764557	user_6971829885	c222ae37ac694c3ea9e1901ae95d7d20	Floor & Decor	-24.253155	104.160573	72.82
118284	2025-12-31 20:00:05.888725	user_6348117987	f4121a75237442f6a093559432c54d8a	MattressFirm	-3.704788	-151.185462	1.68
118285	2025-12-31 20:52:49.646145	user_7921570811	95f566f2dcb54e54b5ea51d06f3b0f4e	Rite Aid	48.028960	172.359464	79.79
118286	2025-12-31 21:01:16.770868	user_1939957235	d1277a82bcca490f9169697daa639a6b	Trader Joe's	74.087849	46.947425	70.51
118287	2025-12-31 21:25:14.221429	user_3338884986	f2ae481eda3a47118f73d1217665fe6f	Priority Auto	21.295012	78.033348	6.45
118288	2025-12-31 22:03:06.505606	user_2210887384	7580b1931b42411bb92cd42208af86e0	Wall to Wall	28.269364	-168.851930	11.98
118289	2025-12-31 23:09:25.786744	user_1997016327	730d4779334f43d0bba602472239993f	Food Giant	78.179653	-51.714236	92.29

Part 2: Defining and Testing Features

We'll create features that measure a user's recent transaction behavior:

A user's average transaction amount over the past 1, 3, and 7 days
A user's total transaction count over the past 1, 3, and 7 days

2.1: Create a Batch Source and Feature View

In Tecton, you declare feature logic through objects like BatchSource and BatchFeatureView. Let's define them in our notebook:

transactions = BatchSource(
    name="transactions",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/transactions.pq",
        file_format="parquet",
        timestamp_field="timestamp",
    ),
)

# Our entity captures the concept of "user"
user = Entity(name="user", join_keys=[Field("user_id", String)])


@batch_feature_view(
    description="User transaction metrics over 1, 3 and 7 days",
    sources=[transactions],
    entities=[user],
    mode="pandas",
    aggregation_interval=timedelta(days=1),
    timestamp_field="timestamp",
    features=[
        Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=1)),
        Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=3)),
        Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=7)),
        Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=1)),
        Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=3)),
        Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=7)),
    ],
)
def user_transaction_metrics(transactions):
    return transactions[["user_id", "timestamp", "amount"]]

2.2: Test Features with Historical Data

Use get_features_in_range to see how these features compute historically:

start = datetime(2022, 1, 1)
end = datetime(2022, 2, 1)

df = user_transaction_metrics.get_features_in_range(start_time=start, end_time=end).to_pandas()
display(df.tail(10))

You’ll see metrics like:

index	user_id	amount_mean_1d_1d	amount_mean_3d_1d	amount_mean_7d_1d	amount_count_3d_1d	amount_count_7d_1d	_valid_to	_valid_from
1519	user_7994770107	NaN	27.895000	190.393333	2	6	2022-01-07 00:00:00+00:00	2022-01-08 00:00:00+00:00
1520	user_8041734544	NaN	843.430000	216.532000	1	5	2022-01-06 00:00:00+00:00	2022-01-07 00:00:00+00:00
1521	user_8096819426	NaN	38.345000	147.498333	2	6	2022-01-02 00:00:00+00:00	2022-01-03 00:00:00+00:00
1522	user_8096819426	NaN	27.130000	138.197143	3	7	2022-01-29 00:00:00+00:00	2022-01-30 00:00:00+00:00
1523	user_8175816267	NaN	313.575000	224.093333	2	3	2022-01-26 00:00:00+00:00	2022-01-27 00:00:00+00:00
1524	user_8468871048	NaN	6.125000	113.736667	2	9	2022-01-07 00:00:00+00:00	2022-01-08 00:00:00+00:00
1525	user_9102789217	NaN	43.673333	38.336000	3	5	2022-01-21 00:00:00+00:00	2022-01-22 00:00:00+00:00
1526	user_9417852028	NaN	1.955000	77.846667	2	6	2022-01-24 00:00:00+00:00	2022-01-25 00:00:00+00:00
1527	user_9704575201	NaN	33.330000	75.414286	3	7	2022-01-01 00:00:00+00:00	2022-01-02 00:00:00+00:00
1528	user_9619731767	NaN	NaN	273.812000	0	5	2022-01-15 00:00:00+00:00	2022-01-16 00:00:00+00:00

For more information about the output schema, see Offline Retrieval Methods and Feature Naming.

Everything looks good! Now let's build a training set.

Part 3: Generating Training Data

We'll predict fraud using a label dataset that marks which transactions turned out to be fraudulent. Let’s load those labels:

training_labels = pd.read_parquet("s3://tecton.ai.public/tutorials/labels.pq", storage_options={"anon": True})

display(training_labels.tail(10))

index	transaction_id	is_fraud
99990	12a48ececaf9fdb7e5cd61dedbb73d1b	0
99991	060ced776ce3efdc30e1517a48e0671d	0
99992	d545c3245bca873d0e3dcba9e1fc722e	0
99993	f57261485341e0e2688eb2e6593dfc5e	0
99994	bdf818c462bd35e90f2598761ca3eccd	0
99995	3728a1ebb7110541e6e3ab39704fda9a	0
99996	2b1bb22bb5ac768cdd1aa29139265de0	1
99997	0b56bb9091539d0938668a893428664a	1
99998	7d46f87ced58994dc58dc5b19641fc46	1
99999	afcd8c782b2d6b0b6c15c74bff122c5f	1

This dataset maps transaction_id to is_fraud (0 or 1). Let's join it to our transactions_df so that each row has user and label info.

training_events = training_labels.merge(transactions_df, on=["transaction_id"], how="left")[
    ["user_id", "timestamp", "amount", "is_fraud"]
]

display(training_events.tail(10))

index	user_id	timestamp	is_fraud	amount
99990	user_5476622522	2024-12-31 18:11:10.528279	98.92	0
99991	user_3202479350	2024-12-31 18:14:30.978084	1.84	0
99992	user_9315055943	2024-12-31 18:22:25.127352	39.41	0
99993	user_2210887384	2024-12-31 19:14:17.889205	52.09	0
99994	user_7921570811	2024-12-31 20:48:18.848095	11.86	0
99995	user_3338884986	2024-12-31 21:49:56.180387	699.06	0
99996	user_8816492034	2024-12-31 22:37:55.129696	2.18	1
99997	user_8816492034	2024-12-31 23:30:23.640727	65.88	1
99998	user_8816492034	2024-12-31 23:34:05.640727	0.95	1
99999	user_8816492034	2024-12-31 23:34:43.640727	2.22	1

3.1: Building a Feature Service

To combine these features for training, we bundle them into a FeatureService:

from tecton import FeatureService

fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service", features=[user_transaction_metrics]
)

3.2: Generate a Point-in-Time Correct Training Set

Tecton will automatically "time travel" to fetch the feature values valid at each event's timestamp. This ensures no leakage from the future.

training_data = fraud_detection_feature_service.get_features_for_events(training_events).to_pandas().fillna(0)

display(training_data.sample(5))

index	user_id	timestamp	amount
0	user_1028747636	2021-01-03 08:42:43.668406	77.09
1	user_1155940157	2021-01-21 03:27:42.566411	43.01
2	user_1567708646	2021-01-20 13:57:14.832615	536.1
3	user_1567708646	2021-01-21 18:13:41.535067	72.16
4	user_1755385063	2021-01-05 04:19:08.782106	96.84

Now we're ready to train a model!

Part 4: Training a Model

We'll use scikit-learn's LogisticRegression for simplicity. You can replace this with any ML library.

from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

df = training_data.drop(["user_id", "timestamp", "amount"], axis=1)
X = df.drop("is_fraud", axis=1)
y = df["is_fraud"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

num_cols = X_train.select_dtypes(exclude=["object"]).columns.tolist()
cat_cols = X_train.select_dtypes(include=["object"]).columns.tolist()

num_pipe = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())
cat_pipe = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="N/A"), OneHotEncoder(handle_unknown="ignore", sparse_output=False)
)

full_pipe = ColumnTransformer([("num", num_pipe, num_cols), ("cat", cat_pipe, cat_cols)])

model = make_pipeline(full_pipe, LogisticRegression(max_iter=1000, random_state=42))
model.fit(X_train, y_train)

y_predict = model.predict(X_test)
print(metrics.classification_report(y_test, y_predict, zero_division=0))

	precision	recall	f1-score	support
0	0.93	0.99	0.96	27076
1	0.82	0.30	0.44	2924
accuracy			0.93	30000
macro avg	0.87	0.65	0.70	30000
weighted avg	0.92	0.93	0.91	30000

You'll see precision, recall, and other metrics in the output. At this point, you could iterate on better features or hyperparameters. Once satisfied, let's deploy.

Part 5: Productionizing Your Tecton Application

In Tecton, production workflows revolve around a Feature Repository and Workspaces:

Feature Repo: Code that defines your features, data sources, and feature services.
Workspace: A project environment for your team or org. Applying your code to a Live Workspace automatically materializes data into Tecton's online and offline stores.

Using Explore (Free Tier)?

Skip ahead if you're on explore.tecton.ai; we’ve already set up a "prod" workspace for you. Otherwise, follow the steps below to create your own Tecton Workspace.

5.1: Create a Tecton Feature Repository

Open a terminal (not inside the notebook) and run:

mkdir tecton-feature-repo
cd tecton-feature-repo
touch features.py
tecton init

5.2: Enable Materialization in `features.py`

Copy your feature definitions into features.py, adding a few extra parameters to tell Tecton how to backfill and keep data fresh:

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregate, FeatureService
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta


transactions = BatchSource(
    name="transactions",
    batch_config=FileConfig(
        uri="s3://tecton.ai.public/tutorials/transactions.pq",
        file_format="parquet",
        timestamp_field="timestamp",
    ),
)

user = Entity(name="user", join_keys=[Field("user_id", String)])


@batch_feature_view(
    description="User transaction metrics over 1, 3 and 7 days",
    sources=[transactions],
    entities=[user],
    mode="pandas",
    timestamp_field="timestamp",
    aggregation_interval=timedelta(days=1),
    features=[
        Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=1)),
        Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=3)),
        Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=7)),
        Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=1)),
        Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=3)),
        Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=7)),
    ],
    online=True,
    offline=True,
    feature_start_time=datetime(2020, 1, 1),
    batch_schedule=timedelta(days=1),
)
def user_transaction_metrics(transactions):
    return transactions[["user_id", "timestamp", "amount"]]


fraud_detection_feature_service = FeatureService(
    name="fraud_detection_feature_service", features=[user_transaction_metrics]
)

5.3: Create and Apply to a Workspace

Back in your terminal:

tecton login [your-org-name].tecton.ai
tecton workspace create [your-name]-quickstart --live
tecton apply

Using workspace "[your-name]-quickstart" on cluster https://explore.tecton.ai
✅ Imported 1 Python module from the feature repository
✅ Imported 1 Python module from the feature repository
⚠️  Running Tests: No tests found.
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Initializing.
 ↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  + Create Batch Data Source
    name:           transactions

  + Create Entity
    name:           user

  + Create Transformation
    name:           user_transaction_metrics
    description:    Trailing average transaction amount over 1, 3 and 7 days

  + Create Batch Feature View
    name:           user_transaction_metrics
    description:    Trailing average transaction amount over 1, 3 and 7 days
    materialization: 11 backfills, 1 recurring batch job
    > backfill:     10 Backfill jobs 2020-01-01 00:00:00 UTC to 2023-08-16 00:00:00 UTC writing to the Offline Store
                    1 Backfill job 2023-08-16 00:00:00 UTC to 2023-08-23 00:00:00 UTC writing to both the Online and Offline Store
    > incremental:  1 Recurring Batch job scheduled every 1 day writing to both the Online and Offline Store

  + Create Feature Service
    name:           fraud_detection_feature_service

 ↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
 Generated plan ID is 8d01ad78e3194a5dbd3f934f04d71564
 View your plan in the Web UI: https://explore.tecton.ai/app/[your-name]-quickstart/plan-summary/8d01ad78e3194a5dbd3f934f04d71564
 ⚠️  Objects in plan contain warnings.

Note: Updates to Feature Services may take up to 60 seconds to be propagated to the real-time feature-serving endpoint.
Note: This workspace ([your-name]-quickstart) is a "Live" workspace. Applying this plan may result in new materialization jobs which will incur costs. Carefully examine the plan output before applying changes.
Are you sure you want to apply this plan to: "[your-name]-quickstart"? [y/N]> y
🎉 all done!

Tecton will:

Register your data source, entity, and feature view.
Kick off backfill jobs to populate your historical data from 2020 onward.
Schedule future jobs to keep your feature data fresh.

Part 6: Serving Features in Real-Time

Once your backfill completes, you can fetch features with millisecond latency through Tecton's HTTP API or SDK. Let’s demo using the HTTP API.

6.1: Create a Service Account

Go to Settings > Service Accounts in the Tecton UI.
Create a new service account and save its API key.
Grant it “Consumer” access to your workspace.

6.2: Write a Helper Function to Fetch Features

Replace your-api-key with your service account key, and if needed, adjust the workspace name (WORKSPACE_NAME) and account URL (ACCOUNT_URL).

import requests, json


def get_online_feature_data(user_id):
    TECTON_API_KEY = "your-api-key"  # replace with your API key
    WORKSPACE_NAME = "prod"  # replace if needed
    ACCOUNT_URL = "explore.tecton.ai"  # replace if needed

    headers = {"Authorization": "Tecton-key " + TECTON_API_KEY}

    request_data = f"""{{
        "params": {{
            "feature_service_name": "fraud_detection_feature_service",
            "join_key_map": {{"user_id": "{user_id}"}},
            "metadata_options": {{"include_names": true}},
            "workspace_name": "{WORKSPACE_NAME}"
        }}
    }}"""

    online_feature_data = requests.post(
        url=f"https://{ACCOUNT_URL}/api/v1/feature-service/get-features",
        headers=headers,
        data=request_data,
    )

    return online_feature_data.json()

6.3: Fetch Features and Run Inference

Fetch the feature values for a specific user:

user_id = "user_1990251765"
feature_data = get_online_feature_data(user_id)

if "result" not in feature_data:
    print("Error: Check your API key or feature materialization status.")
else:
    print(feature_data["result"])

You should see something like:

{
    'features': [None, 14.64, 12.296666666666667, None, '2', '3']
}

Run a Prediction

We’ll reuse the trained logistic regression model from earlier. For simplicity, we’ll just run inference in this notebook.

import pandas as pd


def get_prediction_from_model(feature_data):
    columns = [f["name"].replace(".", "__") for f in feature_data["metadata"]["features"]]
    data = [feature_data["result"]["features"]]
    features = pd.DataFrame(data, columns=columns)[X.columns]
    return model.predict(features)[0]


prediction = get_prediction_from_model(feature_data)
print(prediction)  # 0 = not fraud, 1 = fraud

6.4: Simple Decision Logic

A real fraud detection system might call out to a rules engine or queue an alert for manual review. Here's a tiny function to decide pass/fail:

def evaluate_transaction(user_id):
    online_feature_data = get_online_feature_data(user_id)
    is_predicted_fraud = get_prediction_from_model(online_feature_data)

    if is_predicted_fraud == 0:
        return "Transaction accepted."
    else:
        return "Transaction denied."


evaluate_transaction("user_1990251765")

Transaction accepted.

Wrap-up

Congratulations on building an end-to-end real-time AI application with Tecton! Let's recap:

What You Built

Feature Engineering – Batch features that track user spending patterns
Training Data – A consistent, point-in-time correct dataset
Model Training – A logistic regression model for fraud detection
Productionization – Materializing features to Tecton's online store
Low-Latency Serving – Quick predictions via Tecton's HTTP API

Key Concepts

Batch Feature Views: Transform raw data into features for training and serving
Time Travel Joins: Automatically fetch historically correct feature values
Workspaces & Materialization: Productionize features with minimal overhead
Online Feature Retrieval: Millisecond-latency lookups via Tecton’s REST API

Next Steps

Want to go further? Check out:

Building Streaming Features to learn how to process events as they happen.
Real-time Python transformations, advanced feature validation, or unit tests in your Tecton pipeline.
Monitoring best practices to keep tabs on data drift, feature freshness, and quality.

Tecton can handle much more: streaming data, real-time transformations, monitoring, testing, discovery, and access controls—all while maintaining a single source of truth for your ML features.

What You'll Build​

What You'll Learn​

Part 0: Getting Started​

1. Install Requirements​

2. Log in to Tecton​

3. Set Up Basic Imports​

Part 1: Exploring Data​

Part 2: Defining and Testing Features​

2.1: Create a Batch Source and Feature View​

2.2: Test Features with Historical Data​

Part 3: Generating Training Data​

3.1: Building a Feature Service​

3.2: Generate a Point-in-Time Correct Training Set​

Part 4: Training a Model​

Part 5: Productionizing Your Tecton Application​

5.1: Create a Tecton Feature Repository​

5.2: Enable Materialization in features.py​

5.3: Create and Apply to a Workspace​

Part 6: Serving Features in Real-Time​

6.1: Create a Service Account​

6.2: Write a Helper Function to Fetch Features​

6.3: Fetch Features and Run Inference​

Run a Prediction​

6.4: Simple Decision Logic​

Wrap-up​

What You Built​

Key Concepts​

Next Steps​

Was this page helpful?