๐ Building a Production AI Application with Tecton
Click this button to open this tutorial in Google Colab and get started with zero setup:
Sign-up at explore.tecton.ai for a free account that lets you try out this tutorial and explore Tecton's Web UI.
Tecton helps you build and productionize real-time ML models by making it easy to define, test, and deploy features for training and serving.
Letโs see how quickly we can build a real-time fraud detection model and bring it online.
In this tutorial we will:
- Connect to data on S3
- Define and test features
- Generate a training dataset and train a model
- Productionize our features for real-time serving
- Run real-time inference to predict fraudulent transactions
This tutorial is expected to take about 30 minutes (record time for building a real-time ML application ๐).
Most of this tutorial is intended to be run in a notebook. Some steps will explicitly note to run commands in your terminal.
โ๏ธ Install Pre-Reqsโ
First things first, let's install the Tecton SDK and other libraries used by this tutorial (we recommend in a virtual environment) using:
!pip install 'tecton[rift]==1.0.0' gcsfs s3fs scikit-learn -q
โ Log in to Tectonโ
Next we will authenticate with your organization's Tecton account.
For users that just signed up via explore.tecton.ai
you can leave this step as
is. If your organization has its own Tecton account, replace explore.tecton.ai
with your account url.
Note: You need to press enter
after pasting in your authentication code.
import tecton
tecton.login("explore.tecton.ai") # replace with your URL
Let's then run some basic imports and setup that we will use later in the tutorial.
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, Aggregate
from tecton.types import Field, String, Timestamp, Float64
from datetime import datetime, timedelta
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")
Now we're ready to build!
๐ Examine raw dataโ
First let's examine some historical transaction data that we have available on S3.
import pandas as pd
transactions_df = pd.read_parquet("s3://tecton.ai.public/tutorials/transactions.pq", storage_options={"anon": True})
display(transactions_df.head(5))
timestamp | user_id | transaction_id | merchant | merch_lat | merch_long | amount | |
---|---|---|---|---|---|---|---|
0 | 2021-01-01 00:12:17.950882 | user_7342348753 | df2d61fff650bc36569ab670587e63f1 | Lulu's | -69.4247360 | -121.575701 | 732.27 |
1 | 2021-01-01 00:14:23.411801 | user_5436822157 | 496cb3f422558c4c38f314de0de0b1dd | Camelot Music | 31.7865990 | 75.024895 | 56.14 |
2 | 2021-01-01 00:16:39.189817 | user_8080551036 | 36fade390801962b59d77450075b4f28 | Ernst | 50.7420510 | 125.977939 | 514.87 |
3 | 2021-01-01 00:41:32.604106 | user_6906984756 | e2e2f26c39ecb634d3d28e7c009e93aa | EG Group | 56.3064050 | -59.094746 | 43.85 |
4 | 2021-01-01 00:45:22.095249 | user_7171471634 | 1f6d4225dc6ae8d02f3674c687c0f1cf | Younkers | -81.8907620 | 82.762924 | 50.74 |
๐ฉโ๐ป Define and test features locallyโ
In our data, we see that there's information on users' transactions over time.
Let's use this data to create the following features:
- A user's average transaction amount over 1, 3, and 7 days.
- A user's total transaction count over 1, 3, and 7 days.
To build these features, we will define a "Batch Source" and "Batch Feature View" using Tecton's Feature Engineering Framework.
A Feature View is how we define our feature logic and give Tecton the information it needs to productionize, monitor, and manage features.
Tecton's development workflow allows you to build and test features, as well as generate training data entirely in a notebook! Let's try it out.
transactions = BatchSource(
name="transactions",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
)
# An entity defines the concept we are modeling features for
# The join keys will be used to aggregate, join, and retrieve features
user = Entity(name="user", join_keys=[Field("user_id", String)])
# We use Pandas to transform the raw data and Tecton aggregations to efficiently and accurately compute metrics across raw events
# Feature View decorators contain a wide range of parameters for materializing, cataloging, and monitoring features
@batch_feature_view(
description="User transaction metrics over 1, 3 and 7 days",
sources=[transactions],
entities=[user],
mode="pandas",
aggregation_interval=timedelta(days=1),
timestamp_field="timestamp",
features=[
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="mean", time_window=timedelta(days=7)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=1)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=3)),
Aggregate(input_column=Field("amount", Float64), function="count", time_window=timedelta(days=7)),
],
)
def user_transaction_metrics(transactions):
return transactions[["user_id", "timestamp", "amount"]]