Skip to main content
Version: 1.1

Building a Production AI Application with Tecton

Open In Colab

Not yet a Tecton user?

Sign up at tecton.ai/explore for a free account to try this tutorial and explore Tecton's Web UI.

Need to get a real-time model up and running fast? In this tutorial, we'll build a fraud detection system step-by-step: from data ingestion and feature engineering, all the way to serving predictions in real time.

We'll do it all in Python without needing to assemble or maintain heavy infrastructure. By the end, you’ll have a working production-style pipeline that you can adapt to your own use cases.


What You'll Build​

  1. Data Connection: We'll connect to data on S3 to pull historical transaction events.
  2. Feature Development: We'll define and test batch features for fraud detection, right inside a notebook.
  3. Training Data: We'll generate training datasets, ensuring they're point-in-time correct, and train a simple logistic regression model.
  4. Production-Ready Features: We'll productionize our features by materializing them to Tecton's online store.
  5. Real-Time Inference: We'll retrieve features at low latency and run fraud predictions on new transactions.

What You'll Learn​

  • How to build batch features in Tecton
  • How to test and iterate on feature logic
  • How to generate training data without data leakage
  • How to deploy features to Tecton’s online and offline stores
  • How to serve your model in real-time using Tecton’s low-latency API

Expected time to complete: ~30 minutes.


Part 0: Getting Started​

1. Install Requirements​

You'll need Python >= 3.8. Install Tecton's SDK and a few other packages:

!pip install 'tecton[rift]==1.1.0' gcsfs s3fs scikit-learn -q

2. Log in to Tecton​

If you’re on Tecton’s free Explore tier, you can leave the URL as explore.tecton.ai. Otherwise, replace it with your organization's Tecton URL.

import tecton

tecton.login("explore.tecton.ai") # replace if needed

When prompted:

  1. Open the link in your browser.
  2. Copy your authentication token back into your notebook.
  3. Press Enter.

3. Set Up Basic Imports​

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregate
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta

tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")

That’s it! Your environment is ready. Let's explore our data next.


Part 1: Exploring Data​

We have a historical transactions dataset stored in an S3 bucket. Let's take a quick peek.

import pandas as pd

transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})

display(transactions_df.tail(10))

You should see rows that include:

timestampuser_idtransaction_idmerchantmerch_latmerch_longamount
1182802025-12-31 18:35:35.552987user_24171646007fc4ead916af497387724f04f03a240aSummit Auto89.02462033.02628285.45
1182812025-12-31 19:15:30.052654user_28986805728e76114f89a54b70aee5202d1b7f078eDenny's-34.317633-20.490684342.00
1182822025-12-31 19:24:50.740935user_413377420448e6177cd8034b2f9db5d899784708ebPiazza Auto-86.847150-143.865275814.90
1182832025-12-31 19:30:19.764557user_6971829885c222ae37ac694c3ea9e1901ae95d7d20Floor & Decor-24.253155104.16057372.82
1182842025-12-31 20:00:05.888725user_6348117987f4121a75237442f6a093559432c54d8aMattressFirm-3.704788-151.1854621.68
1182852025-12-31 20:52:49.646145user_792157081195f566f2dcb54e54b5ea51d06f3b0f4eRite Aid48.028960172.35946479.79
1182862025-12-31 21:01:16.770868user_1939957235d1277a82bcca490f9169697daa639a6bTrader Joe's74.08784946.94742570.51
1182872025-12-31 21:25:14.221429user_3338884986f2ae481eda3a47118f73d1217665fe6fPriority Auto21.29501278.0333486.45
1182882025-12-31 22:03:06.505606user_22108873847580b1931b42411bb92cd42208af86e0Wall to Wall28.269364-168.85193011.98
1182892025-12-31 23:09:25.786744user_1997016327730d4779334f43d0bba602472239993fFood Giant78.179653-51.71423692.29

Part 2: Defining and Testing Features​

We'll create features that measure a user's recent transaction behavior:

  • A user's average transaction amount over the past 1, 3, and 7 days
  • A user's total transaction count over the past 1, 3, and 7 days

2.1: Create a Batch Source and Feature View​

In Tecton, you declare feature logic through objects like BatchSource and BatchFeatureView. Let's define them in our notebook:

transactions = BatchSource(
name="transactions",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)

# Our entity captures the concept of "user"
user = Entity(name="user", join_keys=[Field("user_id", String)])


@batch_feature_view(
description="User transaction metrics over 1, 3 and 7 days",
sources=[transactions],
entities=[user],
mode="pandas",
aggregation_interval=timedelta(days=1),
timestamp_field="timestamp",
features=[
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=7)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=7)),
],
)
def user_transaction_metrics(transactions):
return transactions[["user_id", "timestamp", "amount"]]

2.2: Test Features with Historical Data​

Use get_features_in_range to see how these features compute historically:

start = datetime(2022, 1, 1)
end = datetime(2022, 2, 1)

df = user_transaction_metrics.get_features_in_range(start_time=start, end_time=end).to_pandas()
display(df.tail(10))

You’ll see metrics like:

indexuser_idamount_mean_1d_1damount_mean_3d_1damount_mean_7d_1damount_count_1d_1damount_count_3d_1damount_count_7d_1d_valid_to_valid_from
1519user_7994770107NaN27.895000190.3933330262022-01-07 00:00:00+00:002022-01-08 00:00:00+00:00
1520user_8041734544NaN843.430000216.5320000152022-01-06 00:00:00+00:002022-01-07 00:00:00+00:00
1521user_8096819426NaN38.345000147.4983330262022-01-02 00:00:00+00:002022-01-03 00:00:00+00:00
1522user_8096819426NaN27.130000138.1971430372022-01-29 00:00:00+00:002022-01-30 00:00:00+00:00
1523user_8175816267NaN313.575000224.0933330232022-01-26 00:00:00+00:002022-01-27 00:00:00+00:00
1524user_8468871048NaN6.125000113.7366670292022-01-07 00:00:00+00:002022-01-08 00:00:00+00:00
1525user_9102789217NaN43.67333338.3360000352022-01-21 00:00:00+00:002022-01-22 00:00:00+00:00
1526user_9417852028NaN1.95500077.8466670262022-01-24 00:00:00+00:002022-01-25 00:00:00+00:00
1527user_9704575201NaN33.33000075.4142860372022-01-01 00:00:00+00:002022-01-02 00:00:00+00:00
1528user_9619731767NaNNaN273.8120000052022-01-15 00:00:00+00:002022-01-16 00:00:00+00:00

For more information about the output schema, see Offline Retrieval Methods and Feature Naming.

Everything looks good! Now let's build a training set.


Part 3: Generating Training Data​

We'll predict fraud using a label dataset that marks which transactions turned out to be fraudulent. Let’s load those labels:

training_labels = pd.read_parquet("s3://tecton.ai.public/tutorials/labels.pq", storage_options={"anon": True})

display(training_labels.tail(10))
indextransaction_idis_fraud
9999012a48ececaf9fdb7e5cd61dedbb73d1b0
99991060ced776ce3efdc30e1517a48e0671d0
99992d545c3245bca873d0e3dcba9e1fc722e0
99993f57261485341e0e2688eb2e6593dfc5e0
99994bdf818c462bd35e90f2598761ca3eccd0
999953728a1ebb7110541e6e3ab39704fda9a0
999962b1bb22bb5ac768cdd1aa29139265de01
999970b56bb9091539d0938668a893428664a1
999987d46f87ced58994dc58dc5b19641fc461
99999afcd8c782b2d6b0b6c15c74bff122c5f1

This dataset maps transaction_id to is_fraud (0 or 1). Let's join it to our transactions_df so that each row has user and label info.

training_events = training_labels.merge(transactions_df, on=["transaction_id"], how="left")[
["user_id", "timestamp", "amount", "is_fraud"]
]

display(training_events.tail(10))
indexuser_idtimestampis_fraudamount
99990user_54766225222024-12-31 18:11:10.52827998.920
99991user_32024793502024-12-31 18:14:30.9780841.840
99992user_93150559432024-12-31 18:22:25.12735239.410
99993user_22108873842024-12-31 19:14:17.88920552.090
99994user_79215708112024-12-31 20:48:18.84809511.860
99995user_33388849862024-12-31 21:49:56.180387699.060
99996user_88164920342024-12-31 22:37:55.1296962.181
99997user_88164920342024-12-31 23:30:23.64072765.881
99998user_88164920342024-12-31 23:34:05.6407270.951
99999user_88164920342024-12-31 23:34:43.6407272.221

3.1: Building a Feature Service​

To combine these features for training, we bundle them into a FeatureService:

from tecton import FeatureService

fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service", features=[user_transaction_metrics]
)

3.2: Generate a Point-in-Time Correct Training Set​

Tecton will automatically "time travel" to fetch the feature values valid at each event's timestamp. This ensures no leakage from the future.

training_data = fraud_detection_feature_service.get_features_for_events(training_events).to_pandas().fillna(0)

display(training_data.sample(5))
indexuser_idtimestampis_fraudamountuser_transaction_metrics__amount_mean_7d_1duser_transaction_metrics__amount_mean_1d_1duser_transaction_metrics__amount_count_3d_1duser_transaction_metrics__amount_mean_3d_1duser_transaction_metrics__amount_count_7d_1duser_transaction_metrics__amount_count_1d_1d
0user_10287476362021-01-03 08:42:43.668406077.090.00.000.000
1user_11559401572021-01-21 03:27:42.566411043.010.00.000.000
2user_15677086462021-01-20 13:57:14.8326150536.10.00.000.000
3user_15677086462021-01-21 18:13:41.535067072.160.00.000.000
4user_17553850632021-01-05 04:19:08.782106096.840.00.000.000

Now we're ready to train a model!


Part 4: Training a Model​

We'll use scikit-learn's LogisticRegression for simplicity. You can replace this with any ML library.

from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

df = training_data.drop(["user_id", "timestamp", "amount"], axis=1)
X = df.drop("is_fraud", axis=1)
y = df["is_fraud"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

num_cols = X_train.select_dtypes(exclude=["object"]).columns.tolist()
cat_cols = X_train.select_dtypes(include=["object"]).columns.tolist()

num_pipe = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())
cat_pipe = make_pipeline(
SimpleImputer(strategy="constant", fill_value="N/A"), OneHotEncoder(handle_unknown="ignore", sparse_output=False)
)

full_pipe = ColumnTransformer([("num", num_pipe, num_cols), ("cat", cat_pipe, cat_cols)])

model = make_pipeline(full_pipe, LogisticRegression(max_iter=1000, random_state=42))
model.fit(X_train, y_train)

y_predict = model.predict(X_test)
print(metrics.classification_report(y_test, y_predict, zero_division=0))
precisionrecallf1-scoresupport
00.930.990.9627076
10.820.300.442924
accuracy0.9330000
macro avg0.870.650.7030000
weighted avg0.920.930.9130000

You'll see precision, recall, and other metrics in the output. At this point, you could iterate on better features or hyperparameters. Once satisfied, let's deploy.


Part 5: Productionizing Your Tecton Application​

In Tecton, production workflows revolve around a Feature Repository and Workspaces:

  1. Feature Repo: Code that defines your features, data sources, and feature services.
  2. Workspace: A project environment for your team or org. Applying your code to a Live Workspace automatically materializes data into Tecton's online and offline stores.
Using Explore (Free Tier)?

Skip ahead if you're on explore.tecton.ai; we’ve already set up a "prod" workspace for you. Otherwise, follow the steps below to create your own Tecton Workspace.

5.1: Create a Tecton Feature Repository​

Open a terminal (not inside the notebook) and run:

mkdir tecton-feature-repo
cd tecton-feature-repo
touch features.py
tecton init

5.2: Enable Materialization in features.py​

Copy your feature definitions into features.py, adding a few extra parameters to tell Tecton how to backfill and keep data fresh:

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregate, FeatureService
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta


transactions = BatchSource(
name="transactions",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)

user = Entity(name="user", join_keys=[Field("user_id", String)])


@batch_feature_view(
description="User transaction metrics over 1, 3 and 7 days",
sources=[transactions],
entities=[user],
mode="pandas",
timestamp_field="timestamp",
aggregation_interval=timedelta(days=1),
features=[
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=7)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=7)),
],
online=True,
offline=True,
feature_start_time=datetime(2020, 1, 1),
batch_schedule=timedelta(days=1),
)
def user_transaction_metrics(transactions):
return transactions[["user_id", "timestamp", "amount"]]


fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service", features=[user_transaction_metrics]
)

5.3: Create and Apply to a Workspace​

Back in your terminal:

tecton login [your-org-name].tecton.ai
tecton workspace create [your-name]-quickstart --live
tecton apply
Using workspace "[your-name]-quickstart" on cluster https://explore.tecton.ai
βœ… Imported 1 Python module from the feature repository
βœ… Imported 1 Python module from the feature repository
⚠️ Running Tests: No tests found.
βœ… Collecting local feature declarations
βœ… Performing server-side feature validation: Initializing.
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

+ Create Batch Data Source
name: transactions

+ Create Entity
name: user

+ Create Transformation
name: user_transaction_metrics
description: Trailing average transaction amount over 1, 3 and 7 days

+ Create Batch Feature View
name: user_transaction_metrics
description: Trailing average transaction amount over 1, 3 and 7 days
materialization: 11 backfills, 1 recurring batch job
> backfill: 10 Backfill jobs 2020-01-01 00:00:00 UTC to 2023-08-16 00:00:00 UTC writing to the Offline Store
1 Backfill job 2023-08-16 00:00:00 UTC to 2023-08-23 00:00:00 UTC writing to both the Online and Offline Store
> incremental: 1 Recurring Batch job scheduled every 1 day writing to both the Online and Offline Store

+ Create Feature Service
name: fraud_detection_feature_service

↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Generated plan ID is 8d01ad78e3194a5dbd3f934f04d71564
View your plan in the Web UI: https://explore.tecton.ai/app/[your-name]-quickstart/plan-summary/8d01ad78e3194a5dbd3f934f04d71564
⚠️ Objects in plan contain warnings.

Note: Updates to Feature Services may take up to 60 seconds to be propagated to the real-time feature-serving endpoint.
Note: This workspace ([your-name]-quickstart) is a "Live" workspace. Applying this plan may result in new materialization jobs which will incur costs. Carefully examine the plan output before applying changes.
Are you sure you want to apply this plan to: "[your-name]-quickstart"? [y/N]> y
πŸŽ‰ all done!

Tecton will:

  • Register your data source, entity, and feature view.
  • Kick off backfill jobs to populate your historical data from 2020 onward.
  • Schedule future jobs to keep your feature data fresh.

Part 6: Serving Features in Real-Time​

Once your backfill completes, you can fetch features with millisecond latency through Tecton's HTTP API or SDK. Let’s demo using the HTTP API.

6.1: Create a Service Account​

  1. Go to Settings > Service Accounts in the Tecton UI.
  2. Create a new service account and save its API key.
  3. Grant it β€œConsumer” access to your workspace.

6.2: Write a Helper Function to Fetch Features​

Replace your-api-key with your service account key, and if needed, adjust the workspace name (WORKSPACE_NAME) and account URL (ACCOUNT_URL).

import requests, json


def get_online_feature_data(user_id):
TECTON_API_KEY = "your-api-key" # replace with your API key
WORKSPACE_NAME = "prod" # replace if needed
ACCOUNT_URL = "explore.tecton.ai" # replace if needed

headers = {"Authorization": "Tecton-key " + TECTON_API_KEY}

request_data = f"""{{
"params": {{
"feature_service_name": "fraud_detection_feature_service",
"join_key_map": {{"user_id": "{user_id}"}},
"metadata_options": {{"include_names": true}},
"workspace_name": "{WORKSPACE_NAME}"
}}
}}"""

online_feature_data = requests.post(
url=f"https://{ACCOUNT_URL}/api/v1/feature-service/get-features",
headers=headers,
data=request_data,
)

return online_feature_data.json()

6.3: Fetch Features and Run Inference​

Fetch the feature values for a specific user:

user_id = "user_1990251765"
feature_data = get_online_feature_data(user_id)

if "result" not in feature_data:
print("Error: Check your API key or feature materialization status.")
else:
print(feature_data["result"])

You should see something like:

{
'features': [None, 14.64, 12.296666666666667, None, '2', '3']
}

Run a Prediction​

We’ll reuse the trained logistic regression model from earlier. For simplicity, we’ll just run inference in this notebook.

import pandas as pd


def get_prediction_from_model(feature_data):
columns = [f["name"].replace(".", "__") for f in feature_data["metadata"]["features"]]
data = [feature_data["result"]["features"]]
features = pd.DataFrame(data, columns=columns)[X.columns]
return model.predict(features)[0]


prediction = get_prediction_from_model(feature_data)
print(prediction) # 0 = not fraud, 1 = fraud

6.4: Simple Decision Logic​

A real fraud detection system might call out to a rules engine or queue an alert for manual review. Here's a tiny function to decide pass/fail:

def evaluate_transaction(user_id):
online_feature_data = get_online_feature_data(user_id)
is_predicted_fraud = get_prediction_from_model(online_feature_data)

if is_predicted_fraud == 0:
return "Transaction accepted."
else:
return "Transaction denied."


evaluate_transaction("user_1990251765")
Transaction accepted.

Wrap-up​

Congratulations on building an end-to-end real-time AI application with Tecton! Let's recap:

What You Built​

  1. Feature Engineering – Batch features that track user spending patterns
  2. Training Data – A consistent, point-in-time correct dataset
  3. Model Training – A logistic regression model for fraud detection
  4. Productionization – Materializing features to Tecton's online store
  5. Low-Latency Serving – Quick predictions via Tecton's HTTP API

Key Concepts​

  • Batch Feature Views: Transform raw data into features for training and serving
  • Time Travel Joins: Automatically fetch historically correct feature values
  • Workspaces & Materialization: Productionize features with minimal overhead
  • Online Feature Retrieval: Millisecond-latency lookups via Tecton’s REST API

Next Steps​

Want to go further? Check out:

  • Building Streaming Features to learn how to process events as they happen.
  • Real-time Python transformations, advanced feature validation, or unit tests in your Tecton pipeline.
  • Monitoring best practices to keep tabs on data drift, feature freshness, and quality.

Tecton can handle much more: streaming data, real-time transformations, monitoring, testing, discovery, and access controlsβ€”all while maintaining a single source of truth for your ML features.

Was this page helpful?