โก๏ธ Building Streaming Features
Click this button to open this tutorial in Google Colab and get started with zero setup:
Sign-up at tecton.ai/explore for a free account that lets you try out this tutorial and explore Tecton's Web UI.
Real-time data can make all the difference for real-time models, but leveraging it can be quite the challenge.
With Tecton you can build millisecond-fresh features using plain Python and without any complex streaming infrastructure! Best of all, you can test it all locally and iterate in a notebook to quickly train better models that operate consistently online and offline.
This tutorial assumes some basic familiarity with Tecton. If you are new to Tecton, we recommend first checking out Building a Production AI Application with Tecton which walks through an end-to-end journey of building a real-time ML application with Tecton.
Most of this tutorial is intended to be run in a notebook. Some steps will explicitly note to run commands in your terminal.
In this tutorial we will:
- Create a streaming data source
- Define and test streaming features
- Query data online and offline
โ๏ธ Install Pre-Reqsโ
First things first, let's install the Tecton SDK and other libraries used by this tutorial (we recommend in a virtual environment) using:
!pip install 'tecton[rift]==1.0.0' gcsfs s3fs -q
โ Log in to Tectonโ
Next we will authenticate with your organization's Tecton account and import libraries we will need.
For users that just signed up via explore.tecton.ai
you can leave this step as
is. If your organization has its own Tecton account, replace explore.tecton.ai
with your account url.
Note: You need to press enter
after pasting in your authentication code.
import tecton
import pandas as pd
from datetime import datetime
from pprint import pprint
import random, string
tecton.login("explore.tecton.ai") # replace with your org's URL if needed
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")
Now we're ready to build!
๐ Create a Stream Source for ingesting real-time dataโ
First, let's define a local Stream Source that supports ingesting real-time data. Once productionized, this will give us an online HTTP endpoint to push events to in real-time which Tecton will then transform into features for online inference.
As part of our Stream Source, we also register a historical log of the stream
via the batch_config
parameter. Tecton uses this historical log for backfills
and offline development.
Alternatively, you can have Tecton maintain this historical log for you! Simply
add the log_offline=True
parameter to the PushConfig
and omit the
batch_config
. With this setup, Tecton will log all ingested events and use
those to backfill any features that use this source.
from tecton import StreamSource, PushConfig, FileConfig
from tecton.types import Field, String, Timestamp, Float64
transactions_stream = StreamSource(
name="transactions_stream",
stream_config=PushConfig(),
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)],
)
๐ Test the new Stream Sourceโ
We can pull a range of offline data from a Stream Source's historical event log
using get_dataframe()
.
start = datetime(2023, 5, 1)
end = datetime(2023, 8, 1)
df = transactions_stream.get_dataframe(start, end).to_pandas()
display(df.head(5))
index | timestamp | user_id | transaction_id | merchant | merch_lat | merch_long | amount |
---|---|---|---|---|---|---|---|
0 | 2023-05-01 00:38:11.716917 | user_5560069050 | 33dd0f5c6ece08ff84d10227e83a6936 | Mervyn's | "49.7518840" | "-140.759320" | 50.77 |
1 | 2023-05-01 00:45:03.031664 | user_8277121337 | 7b220dda23f7d8813062ad0f95c579c6 | Quality Stores | "-14.0234110" | "-121.107220" | 95.36 |
2 | 2023-05-01 00:53:40.791126 | user_4409718407 | ff22d24fef0164070cdae9771d8bf9c3 | Gottschalks | "3.7894785" | "53.168767" | 12.02 |
3 | 2023-05-01 01:18:38.115718 | user_6606710651 | e924834f85f588419f181e55cb61771d | Visionworks | "-14.0913895" | "-53.801625" | 866.82 |
4 | 2023-05-01 01:56:28.194464 | user_1200838555 | 38afd7d44a13c890ed5d3bbdf96d95a0 | Cook United | "82.5886295" | "-152.906522" | 0.04 |