Building a Production AI Application with Tecton
Sign up at tecton.ai/explore for a free account to try this tutorial and explore Tecton's Web UI.
Need to get a real-time model up and running fast? In this tutorial, we'll build a fraud detection system step-by-step: from data ingestion and feature engineering, all the way to serving predictions in real time.
We'll do it all in Python without needing to assemble or maintain heavy infrastructure. By the end, youβll have a working production-style pipeline that you can adapt to your own use cases.
What You'll Buildβ
- Data Connection: We'll connect to data on S3 to pull historical transaction events.
- Feature Development: We'll define and test batch features for fraud detection, right inside a notebook.
- Training Data: We'll generate training datasets, ensuring they're point-in-time correct, and train a simple logistic regression model.
- Production-Ready Features: We'll productionize our features by materializing them to Tecton's online store.
- Real-Time Inference: We'll retrieve features at low latency and run fraud predictions on new transactions.
What You'll Learnβ
- How to build batch features in Tecton
- How to test and iterate on feature logic
- How to generate training data without data leakage
- How to deploy features to Tectonβs online and offline stores
- How to serve your model in real-time using Tectonβs low-latency API
Expected time to complete: ~30 minutes.
Part 0: Getting Startedβ
1. Install Requirementsβ
You'll need Python >= 3.8. Install Tecton's SDK and a few other packages:
!pip install 'tecton[rift]==1.1.0' gcsfs s3fs scikit-learn -q
2. Log in to Tectonβ
If youβre on Tectonβs free Explore tier, you can leave the URL as
explore.tecton.ai
. Otherwise, replace it with your organization's Tecton URL.
import tecton
tecton.login("explore.tecton.ai") # replace if needed
When prompted:
- Open the link in your browser.
- Copy your authentication token back into your notebook.
- Press Enter.
3. Set Up Basic Importsβ
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregate
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")
Thatβs it! Your environment is ready. Let's explore our data next.
Part 1: Exploring Dataβ
We have a historical transactions dataset stored in an S3 bucket. Let's take a quick peek.
import pandas as pd
transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})
display(transactions_df.tail(10))
You should see rows that include:
timestamp | user_id | transaction_id | merchant | merch_lat | merch_long | amount | |
---|---|---|---|---|---|---|---|
118280 | 2025-12-31 18:35:35.552987 | user_2417164600 | 7fc4ead916af497387724f04f03a240a | Summit Auto | 89.024620 | 33.026282 | 85.45 |
118281 | 2025-12-31 19:15:30.052654 | user_2898680572 | 8e76114f89a54b70aee5202d1b7f078e | Denny's | -34.317633 | -20.490684 | 342.00 |
118282 | 2025-12-31 19:24:50.740935 | user_4133774204 | 48e6177cd8034b2f9db5d899784708eb | Piazza Auto | -86.847150 | -143.865275 | 814.90 |
118283 | 2025-12-31 19:30:19.764557 | user_6971829885 | c222ae37ac694c3ea9e1901ae95d7d20 | Floor & Decor | -24.253155 | 104.160573 | 72.82 |
118284 | 2025-12-31 20:00:05.888725 | user_6348117987 | f4121a75237442f6a093559432c54d8a | MattressFirm | -3.704788 | -151.185462 | 1.68 |
118285 | 2025-12-31 20:52:49.646145 | user_7921570811 | 95f566f2dcb54e54b5ea51d06f3b0f4e | Rite Aid | 48.028960 | 172.359464 | 79.79 |
118286 | 2025-12-31 21:01:16.770868 | user_1939957235 | d1277a82bcca490f9169697daa639a6b | Trader Joe's | 74.087849 | 46.947425 | 70.51 |
118287 | 2025-12-31 21:25:14.221429 | user_3338884986 | f2ae481eda3a47118f73d1217665fe6f | Priority Auto | 21.295012 | 78.033348 | 6.45 |
118288 | 2025-12-31 22:03:06.505606 | user_2210887384 | 7580b1931b42411bb92cd42208af86e0 | Wall to Wall | 28.269364 | -168.851930 | 11.98 |
118289 | 2025-12-31 23:09:25.786744 | user_1997016327 | 730d4779334f43d0bba602472239993f | Food Giant | 78.179653 | -51.714236 | 92.29 |
Part 2: Defining and Testing Featuresβ
We'll create features that measure a user's recent transaction behavior:
- A user's average transaction amount over the past 1, 3, and 7 days
- A user's total transaction count over the past 1, 3, and 7 days
2.1: Create a Batch Source and Feature Viewβ
In Tecton, you declare feature logic through objects like BatchSource
and
BatchFeatureView
. Let's define them in our notebook:
transactions = BatchSource(
name="transactions",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)
# Our entity captures the concept of "user"
user = Entity(name="user", join_keys=[Field("user_id", String)])
@batch_feature_view(
description="User transaction metrics over 1, 3 and 7 days",
sources=[transactions],
entities=[user],
mode="pandas",
aggregation_interval=timedelta(days=1),
timestamp_field="timestamp",
features=[
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=7)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=7)),
],
)
def user_transaction_metrics(transactions):
return transactions[["user_id", "timestamp", "amount"]]
2.2: Test Features with Historical Dataβ
Use get_features_in_range
to see how these features compute historically:
start = datetime(2022, 1, 1)
end = datetime(2022, 2, 1)
df = user_transaction_metrics.get_features_in_range(start_time=start, end_time=end).to_pandas()
display(df.tail(10))
Youβll see metrics like:
index | user_id | amount_mean_1d_1d | amount_mean_3d_1d | amount_mean_7d_1d | amount_count_1d_1d | amount_count_3d_1d | amount_count_7d_1d | _valid_to | _valid_from |
---|---|---|---|---|---|---|---|---|---|
1519 | user_7994770107 | NaN | 27.895000 | 190.393333 | 0 | 2 | 6 | 2022-01-07 00:00:00+00:00 | 2022-01-08 00:00:00+00:00 |
1520 | user_8041734544 | NaN | 843.430000 | 216.532000 | 0 | 1 | 5 | 2022-01-06 00:00:00+00:00 | 2022-01-07 00:00:00+00:00 |
1521 | user_8096819426 | NaN | 38.345000 | 147.498333 | 0 | 2 | 6 | 2022-01-02 00:00:00+00:00 | 2022-01-03 00:00:00+00:00 |
1522 | user_8096819426 | NaN | 27.130000 | 138.197143 | 0 | 3 | 7 | 2022-01-29 00:00:00+00:00 | 2022-01-30 00:00:00+00:00 |
1523 | user_8175816267 | NaN | 313.575000 | 224.093333 | 0 | 2 | 3 | 2022-01-26 00:00:00+00:00 | 2022-01-27 00:00:00+00:00 |
1524 | user_8468871048 | NaN | 6.125000 | 113.736667 | 0 | 2 | 9 | 2022-01-07 00:00:00+00:00 | 2022-01-08 00:00:00+00:00 |
1525 | user_9102789217 | NaN | 43.673333 | 38.336000 | 0 | 3 | 5 | 2022-01-21 00:00:00+00:00 | 2022-01-22 00:00:00+00:00 |
1526 | user_9417852028 | NaN | 1.955000 | 77.846667 | 0 | 2 | 6 | 2022-01-24 00:00:00+00:00 | 2022-01-25 00:00:00+00:00 |
1527 | user_9704575201 | NaN | 33.330000 | 75.414286 | 0 | 3 | 7 | 2022-01-01 00:00:00+00:00 | 2022-01-02 00:00:00+00:00 |
1528 | user_9619731767 | NaN | NaN | 273.812000 | 0 | 0 | 5 | 2022-01-15 00:00:00+00:00 | 2022-01-16 00:00:00+00:00 |
For more information about the output schema, see Offline Retrieval Methods and Feature Naming.
Everything looks good! Now let's build a training set.
Part 3: Generating Training Dataβ
We'll predict fraud using a label dataset that marks which transactions turned out to be fraudulent. Letβs load those labels:
training_labels = pd.read_parquet("s3://tecton.ai.public/tutorials/labels.pq", storage_options={"anon": True})
display(training_labels.tail(10))
index | transaction_id | is_fraud |
---|---|---|
99990 | 12a48ececaf9fdb7e5cd61dedbb73d1b | 0 |
99991 | 060ced776ce3efdc30e1517a48e0671d | 0 |
99992 | d545c3245bca873d0e3dcba9e1fc722e | 0 |
99993 | f57261485341e0e2688eb2e6593dfc5e | 0 |
99994 | bdf818c462bd35e90f2598761ca3eccd | 0 |
99995 | 3728a1ebb7110541e6e3ab39704fda9a | 0 |
99996 | 2b1bb22bb5ac768cdd1aa29139265de0 | 1 |
99997 | 0b56bb9091539d0938668a893428664a | 1 |
99998 | 7d46f87ced58994dc58dc5b19641fc46 | 1 |
99999 | afcd8c782b2d6b0b6c15c74bff122c5f | 1 |
This dataset maps transaction_id
to is_fraud
(0 or 1). Let's join it to our
transactions_df
so that each row has user and label info.
training_events = training_labels.merge(transactions_df, on=["transaction_id"], how="left")[
["user_id", "timestamp", "amount", "is_fraud"]
]
display(training_events.tail(10))
index | user_id | timestamp | is_fraud | amount |
---|---|---|---|---|
99990 | user_5476622522 | 2024-12-31 18:11:10.528279 | 98.92 | 0 |
99991 | user_3202479350 | 2024-12-31 18:14:30.978084 | 1.84 | 0 |
99992 | user_9315055943 | 2024-12-31 18:22:25.127352 | 39.41 | 0 |
99993 | user_2210887384 | 2024-12-31 19:14:17.889205 | 52.09 | 0 |
99994 | user_7921570811 | 2024-12-31 20:48:18.848095 | 11.86 | 0 |
99995 | user_3338884986 | 2024-12-31 21:49:56.180387 | 699.06 | 0 |
99996 | user_8816492034 | 2024-12-31 22:37:55.129696 | 2.18 | 1 |
99997 | user_8816492034 | 2024-12-31 23:30:23.640727 | 65.88 | 1 |
99998 | user_8816492034 | 2024-12-31 23:34:05.640727 | 0.95 | 1 |
99999 | user_8816492034 | 2024-12-31 23:34:43.640727 | 2.22 | 1 |
3.1: Building a Feature Serviceβ
To combine these features for training, we bundle them into a FeatureService
:
from tecton import FeatureService
fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service", features=[user_transaction_metrics]
)
3.2: Generate a Point-in-Time Correct Training Setβ
Tecton will automatically "time travel" to fetch the feature values valid at each event's timestamp. This ensures no leakage from the future.
training_data = fraud_detection_feature_service.get_features_for_events(training_events).to_pandas().fillna(0)
display(training_data.sample(5))
index | user_id | timestamp | is_fraud | amount | user_transaction_metrics__amount_mean_7d_1d | user_transaction_metrics__amount_mean_1d_1d | user_transaction_metrics__amount_count_3d_1d | user_transaction_metrics__amount_mean_3d_1d | user_transaction_metrics__amount_count_7d_1d | user_transaction_metrics__amount_count_1d_1d |
---|---|---|---|---|---|---|---|---|---|---|
0 | user_1028747636 | 2021-01-03 08:42:43.668406 | 0 | 77.09 | 0.0 | 0.0 | 0 | 0.0 | 0 | 0 |
1 | user_1155940157 | 2021-01-21 03:27:42.566411 | 0 | 43.01 | 0.0 | 0.0 | 0 | 0.0 | 0 | 0 |
2 | user_1567708646 | 2021-01-20 13:57:14.832615 | 0 | 536.1 | 0.0 | 0.0 | 0 | 0.0 | 0 | 0 |
3 | user_1567708646 | 2021-01-21 18:13:41.535067 | 0 | 72.16 | 0.0 | 0.0 | 0 | 0.0 | 0 | 0 |
4 | user_1755385063 | 2021-01-05 04:19:08.782106 | 0 | 96.84 | 0.0 | 0.0 | 0 | 0.0 | 0 | 0 |
Now we're ready to train a model!
Part 4: Training a Modelβ
We'll use scikit-learn's LogisticRegression
for simplicity. You can replace
this with any ML library.
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
df = training_data.drop(["user_id", "timestamp", "amount"], axis=1)
X = df.drop("is_fraud", axis=1)
y = df["is_fraud"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
num_cols = X_train.select_dtypes(exclude=["object"]).columns.tolist()
cat_cols = X_train.select_dtypes(include=["object"]).columns.tolist()
num_pipe = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())
cat_pipe = make_pipeline(
SimpleImputer(strategy="constant", fill_value="N/A"), OneHotEncoder(handle_unknown="ignore", sparse_output=False)
)
full_pipe = ColumnTransformer([("num", num_pipe, num_cols), ("cat", cat_pipe, cat_cols)])
model = make_pipeline(full_pipe, LogisticRegression(max_iter=1000, random_state=42))
model.fit(X_train, y_train)
y_predict = model.predict(X_test)
print(metrics.classification_report(y_test, y_predict, zero_division=0))
precision | recall | f1-score | support | |
---|---|---|---|---|
0 | 0.93 | 0.99 | 0.96 | 27076 |
1 | 0.82 | 0.30 | 0.44 | 2924 |
accuracy | 0.93 | 30000 | ||
macro avg | 0.87 | 0.65 | 0.70 | 30000 |
weighted avg | 0.92 | 0.93 | 0.91 | 30000 |
You'll see precision, recall, and other metrics in the output. At this point, you could iterate on better features or hyperparameters. Once satisfied, let's deploy.
Part 5: Productionizing Your Tecton Applicationβ
In Tecton, production workflows revolve around a Feature Repository and Workspaces:
- Feature Repo: Code that defines your features, data sources, and feature services.
- Workspace: A project environment for your team or org. Applying your code to a Live Workspace automatically materializes data into Tecton's online and offline stores.
Skip ahead if you're on explore.tecton.ai
; weβve already set up a "prod"
workspace for you. Otherwise, follow the steps below to create your own Tecton
Workspace.
5.1: Create a Tecton Feature Repositoryβ
Open a terminal (not inside the notebook) and run:
mkdir tecton-feature-repo
cd tecton-feature-repo
touch features.py
tecton init
5.2: Enable Materialization in features.py
β
Copy your feature definitions into features.py
, adding a few extra parameters
to tell Tecton how to backfill and keep data fresh:
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregate, FeatureService
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta
transactions = BatchSource(
name="transactions",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)
user = Entity(name="user", join_keys=[Field("user_id", String)])
@batch_feature_view(
description="User transaction metrics over 1, 3 and 7 days",
sources=[transactions],
entities=[user],
mode="pandas",
timestamp_field="timestamp",
aggregation_interval=timedelta(days=1),
features=[
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=7)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=7)),
],
online=True,
offline=True,
feature_start_time=datetime(2020, 1, 1),
batch_schedule=timedelta(days=1),
)
def user_transaction_metrics(transactions):
return transactions[["user_id", "timestamp", "amount"]]
fraud_detection_feature_service = FeatureService(
name="fraud_detection_feature_service", features=[user_transaction_metrics]
)
5.3: Create and Apply to a Workspaceβ
Back in your terminal:
tecton login [your-org-name].tecton.ai
tecton workspace create [your-name]-quickstart --live
tecton apply
Using workspace "[your-name]-quickstart" on cluster https://explore.tecton.ai
β
Imported 1 Python module from the feature repository
β
Imported 1 Python module from the feature repository
β οΈ Running Tests: No tests found.
β
Collecting local feature declarations
β
Performing server-side feature validation: Initializing.
ββββββββββββ Plan Start ββββββββββ
+ Create Batch Data Source
name: transactions
+ Create Entity
name: user
+ Create Transformation
name: user_transaction_metrics
description: Trailing average transaction amount over 1, 3 and 7 days
+ Create Batch Feature View
name: user_transaction_metrics
description: Trailing average transaction amount over 1, 3 and 7 days
materialization: 11 backfills, 1 recurring batch job
> backfill: 10 Backfill jobs 2020-01-01 00:00:00 UTC to 2023-08-16 00:00:00 UTC writing to the Offline Store
1 Backfill job 2023-08-16 00:00:00 UTC to 2023-08-23 00:00:00 UTC writing to both the Online and Offline Store
> incremental: 1 Recurring Batch job scheduled every 1 day writing to both the Online and Offline Store
+ Create Feature Service
name: fraud_detection_feature_service
ββββββββββββ Plan End ββββββββββββ
Generated plan ID is 8d01ad78e3194a5dbd3f934f04d71564
View your plan in the Web UI: https://explore.tecton.ai/app/[your-name]-quickstart/plan-summary/8d01ad78e3194a5dbd3f934f04d71564
β οΈ Objects in plan contain warnings.
Note: Updates to Feature Services may take up to 60 seconds to be propagated to the real-time feature-serving endpoint.
Note: This workspace ([your-name]-quickstart) is a "Live" workspace. Applying this plan may result in new materialization jobs which will incur costs. Carefully examine the plan output before applying changes.
Are you sure you want to apply this plan to: "[your-name]-quickstart"? [y/N]> y
π all done!
Tecton will:
- Register your data source, entity, and feature view.
- Kick off backfill jobs to populate your historical data from 2020 onward.
- Schedule future jobs to keep your feature data fresh.
Part 6: Serving Features in Real-Timeβ
Once your backfill completes, you can fetch features with millisecond latency through Tecton's HTTP API or SDK. Letβs demo using the HTTP API.
6.1: Create a Service Accountβ
- Go to Settings > Service Accounts in the Tecton UI.
- Create a new service account and save its API key.
- Grant it βConsumerβ access to your workspace.
6.2: Write a Helper Function to Fetch Featuresβ
Replace your-api-key
with your service account key, and if needed, adjust the
workspace name (WORKSPACE_NAME
) and account URL (ACCOUNT_URL
).
import requests, json
def get_online_feature_data(user_id):
TECTON_API_KEY = "your-api-key" # replace with your API key
WORKSPACE_NAME = "prod" # replace if needed
ACCOUNT_URL = "explore.tecton.ai" # replace if needed
headers = {"Authorization": "Tecton-key " + TECTON_API_KEY}
request_data = f"""{{
"params": {{
"feature_service_name": "fraud_detection_feature_service",
"join_key_map": {{"user_id": "{user_id}"}},
"metadata_options": {{"include_names": true}},
"workspace_name": "{WORKSPACE_NAME}"
}}
}}"""
online_feature_data = requests.post(
url=f"https://{ACCOUNT_URL}/api/v1/feature-service/get-features",
headers=headers,
data=request_data,
)
return online_feature_data.json()
6.3: Fetch Features and Run Inferenceβ
Fetch the feature values for a specific user:
user_id = "user_1990251765"
feature_data = get_online_feature_data(user_id)
if "result" not in feature_data:
print("Error: Check your API key or feature materialization status.")
else:
print(feature_data["result"])
You should see something like:
{
'features': [None, 14.64, 12.296666666666667, None, '2', '3']
}
Run a Predictionβ
Weβll reuse the trained logistic regression model from earlier. For simplicity, weβll just run inference in this notebook.
import pandas as pd
def get_prediction_from_model(feature_data):
columns = [f["name"].replace(".", "__") for f in feature_data["metadata"]["features"]]
data = [feature_data["result"]["features"]]
features = pd.DataFrame(data, columns=columns)[X.columns]
return model.predict(features)[0]
prediction = get_prediction_from_model(feature_data)
print(prediction) # 0 = not fraud, 1 = fraud
6.4: Simple Decision Logicβ
A real fraud detection system might call out to a rules engine or queue an alert for manual review. Here's a tiny function to decide pass/fail:
def evaluate_transaction(user_id):
online_feature_data = get_online_feature_data(user_id)
is_predicted_fraud = get_prediction_from_model(online_feature_data)
if is_predicted_fraud == 0:
return "Transaction accepted."
else:
return "Transaction denied."
evaluate_transaction("user_1990251765")
Transaction accepted.
Wrap-upβ
Congratulations on building an end-to-end real-time AI application with Tecton! Let's recap:
What You Builtβ
- Feature Engineering β Batch features that track user spending patterns
- Training Data β A consistent, point-in-time correct dataset
- Model Training β A logistic regression model for fraud detection
- Productionization β Materializing features to Tecton's online store
- Low-Latency Serving β Quick predictions via Tecton's HTTP API
Key Conceptsβ
- Batch Feature Views: Transform raw data into features for training and serving
- Time Travel Joins: Automatically fetch historically correct feature values
- Workspaces & Materialization: Productionize features with minimal overhead
- Online Feature Retrieval: Millisecond-latency lookups via Tectonβs REST API
Next Stepsβ
Want to go further? Check out:
- Building Streaming Features to learn how to process events as they happen.
- Real-time Python transformations, advanced feature validation, or unit tests in your Tecton pipeline.
- Monitoring best practices to keep tabs on data drift, feature freshness, and quality.
Tecton can handle much more: streaming data, real-time transformations, monitoring, testing, discovery, and access controlsβall while maintaining a single source of truth for your ML features.