Skip to main content
Version: 1.1

🚀 Tecton Quickstart

Open In Colab

Want to see what Tecton can do? In this 10-minute quickstart, you'll build a streaming feature pipeline that can process real-time transaction data and serve features with millisecond latency. Perfect for fraud detection, real-time recommendations, or any application that needs fresh feature values.

What You'll Build

We'll create a simple but powerful feature pipeline that:

  • Ingests streaming transaction data in real-time
  • Computes running totals over different time windows
  • Serves feature values with sub-second latency

What You'll Learn

  • How to define features in Tecton using Python
  • How to test features directly in your notebook
  • How to ingest real-time data and see immediate updates

No prior Tecton experience needed - if you're comfortable with Python, you can complete this tutorial. Ready to get started?

Let's start by setting up your environment! You'll need Python >= 3.8 to get started.

Prerequisites

First, let's get your environment set up. You'll need:

1. Install Tecton

!pip install 'tecton[rift]==1.1.0' gcsfs s3fs -q

2. Set Up Python Imports

from tecton import *
from tecton.types import *
from datetime import datetime, timedelta
import pandas as pd

# Configure Tecton to use Rift for compute
tecton.conf.set("TECTON_OFFLINE_RETRIEVAL_COMPUTE_MODE", "rift")

3. Log in to Tecton

import tecton

tecton.login("explore.tecton.ai")

When you run this command, you'll be prompted to:

  1. Open a browser window
  2. Click a button to generate an authentication token
  3. Copy the token back into your notebook
  4. Press Enter to continue
Not yet a Tecton user? Sign up at

tecton.ai/explore for a free account to try this tutorial.

That's it for setup! Your environment is ready for building features. Let's create your first streaming feature pipeline.

Part 1: Your First Streaming Feature

Let's build a feature that tracks user transaction amounts in real-time. This would be useful in a fraud detection use case, where you may want to analyze user spending to look for anomalies. First, we need to tell Tecton where our data will come from and how to handle it.

In Tecton, a StreamSource defines how to ingest real-time data. Think of it as a connection point for your streaming data that handles both:

  • Real-time data ingestion through an HTTP API
  • Historical data for testing and training from files or databases

Here's how we set one up:

transactions_stream = StreamSource(
name="transactions_stream",
# Configure real-time ingestion via HTTP API
stream_config=PushConfig(),
# Historical data for testing and backfills
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/transactions.pq",
file_format="parquet",
timestamp_field="timestamp",
),
# Define what our data looks like
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amount", Float64)],
)

# Define our user entity - this tells Tecton how to identify unique users
user = Entity(name="user", join_keys=[Field("user_id", String)])

Now that we have our data source set up, let's create a feature that calculates running totals over time. We'll use a StreamFeatureView, which lets us:

  • Transform streaming data in real-time
  • Calculate aggregations over different time windows
  • Serve feature values with millisecond latency

For our example, we'll track transaction totals over three time windows - the last minute for very recent activity, the last hour for short-term patterns, and the last 30 days for long-term behavior:

@stream_feature_view(
source=transactions_stream,
entities=[user],
mode="pandas",
timestamp_field="timestamp",
features=[
# Last minute total (for very recent activity)
Aggregate(input_column=Field("amount", Float64), function="sum", time_window=timedelta(minutes=1)),
# Last hour total (for recent patterns)
Aggregate(input_column=Field("amount", Float64), function="sum", time_window=timedelta(hours=1)),
# Last 30 days total (for long-term patterns)
Aggregate(input_column=Field("amount", Float64), function="sum", time_window=timedelta(days=30)),
],
)
def user_transaction_amount_totals(transactions_stream):
# Select just the columns we need for our features
return transactions_stream[["user_id", "timestamp", "amount"]]

One of the great things about Tecton is that we can test our features right away using historical data. This helps us validate that everything is working as expected before we start processing real-time data:

# Test the feature using historical data
df = (
user_transaction_amount_totals.get_features_in_range(start_time=datetime(2022, 1, 1), end_time=datetime(2022, 2, 1))
.to_pandas()
.fillna(0)
)

# Look at the first few results
print("Sample feature values:")
print(df.head(3))

Example output:

Sample feature values:
user_id amount_sum_1m_continuous amount_sum_1h_continuous amount_sum_30d_continuous
0 user_2210887384 0.0 4.55 3429.12
1 user_2417164600 0.0 1.97 2048.18
2 user_9757807451 0.0 98.37 12365.00

These results show us that our feature is working! For each user, we can see their transaction totals over different time windows. The different windows let us capture both immediate activity (1-minute window), recent patterns (1-hour window), and longer-term behavior (30-day window).

In the next section, we'll make this real-time by sending in live data and watching these features update instantly!

Part 2: Making it Real-Time

Now comes the exciting part - we'll see our features update in real-time! To do this, we need to:

  1. Set up a connection to Tecton's production environment
  2. Send in some transaction data
  3. Immediately retrieve our updated feature values

First, we need an API key to connect to Tecton's production environment. This key lets us both send data and retrieve feature values securely:

import random, string

# Replace with your API key from https://explore.tecton.ai/app/settings/accounts-and-access/service-accounts
tecton.login(tecton_url="explore.tecton.ai", tecton_api_key="your-api-key")

# Connect to our production workspace and get references to our feature objects
ws = tecton.get_workspace("prod")
ds = ws.get_data_source("transactions_stream")
fv = ws.get_feature_view("user_transaction_amount_totals")

# Generate a test user ID - in production, this would be your real user ID
user_id = "user_" + "".join(random.choices(string.digits, k=7))
print("Generated test user ID:", user_id)

Now we can simulate a real transaction by sending data to our Stream Source. When we do this, Tecton will:

  1. Validate the data matches our schema
  2. Process it through our feature transformations
  3. Update the feature values in real-time
  4. Make them available for immediate retrieval

Let's try it:

# Send in a new transaction
record = ds.ingest({"user_id": user_id, "timestamp": datetime.utcnow(), "amount": 100.00})
print("Ingested transaction:")
print(record)

# Immediately fetch the updated features - this happens in milliseconds!
features = fv.get_online_features(join_keys={"user_id": user_id}).to_dict()
print("\nUpdated feature values:")
print(features)

You'll see output like this:

Generated test user ID: user_7370526

Ingested transaction:
{'workspaceName': 'prod', 'ingestMetrics': {'featureViewIngestMetrics': [{'featureViewName': 'user_transaction_amount_totals', 'onlineRecordIngestCount': '1'}]}}

Updated feature values:
{'amount_sum_1h_continuous': 100.0, 'amount_sum_1m_continuous': 100.0, 'amount_sum_30d_continuous': 100.0}

Here's where it gets interesting - try running the ingest code multiple times with different amounts! You'll see how the different time windows behave:

  • The 1-minute window shows the most recent activity - perfect for detecting sudden bursts of transactions
  • The 1-hour window accumulates transactions over a longer period - useful for identifying short-term patterns
  • The 30-day window maintains a longer-term view - great for understanding typical user behavior

This kind of real-time feature computation is incredibly powerful for:

  • Fraud detection systems that need to spot unusual spending patterns immediately
  • Recommendation engines that adapt to user behavior in real-time
  • Any application where fresh feature values make a difference

Under the hood, Tecton is handling all the complexity of:

  • Streaming data ingestion
  • Real-time feature computation
  • Time window management
  • Low-latency feature serving

In a real application, you'd send this data directly to Tecton's HTTP API for the best performance. The .ingest() method we're using here is great for testing and development, but for production use cases, you'll want to use the HTTP API directly.

Let's wrap up and look at what you've accomplished!

Wrap-up

Congratulations! You've just built a real-time feature pipeline that can:

  • Ingest streaming transaction data
  • Calculate features over multiple time windows
  • Serve feature values with sub-second latency

What We Built

  • A streaming data source that accepts real-time events
  • A feature view that computes running totals over three time windows:
    • 1 minute (very recent activity)
    • 1 hour (recent patterns)
    • 30 days (long-term patterns)
  • A real-time serving endpoint for these features

Key Concepts You've Learned

  • How to define a StreamSource for real-time data
  • How to create time-windowed aggregation features
  • How to test features with historical data
  • How to send and receive real-time updates

Next Steps

  1. Explore More Features

    • Try different aggregation functions (mean, count, max)
    • Add more time windows
    • Combine multiple features together
  2. Ready for Production? Try our "Building a Production AI Application" tutorial to learn:

    • How to set up monitoring
    • How to generate training data
    • Best practices for production deployment

Want to use Tecton with your own data? Sign up for a free trial at tecton.ai/free-trial to get started!

Was this page helpful?