Skip to main content
Version: Beta 🚧

Feature Types

Tecton provides a flexible framework for defining features that power ML or rule-based decisioning systems.

Tecton supports two tiers of feature definitions:

  1. Attribute features – the universal, do-anything building block

    Use an Attribute feature view whenever you can already express your transformation in Python, Spark, Pandas, or SQL. It works with any source and any logic.

  2. Specialized feature templates – tuned for common patterns

    • Time-window Aggregations – rolling counts, sums, averages, etc.
    • Embeddings – vector outputs from Embedding models.
    • Model Generated Features - structured outputs from PyTorch models.
    • Sequence features – ordered event lists.

    These templates deliver performance, storage and compute optimizations so you don't have to hand-roll them.

Entities and Join Keys​

Entities represent the primary objects (such as users, items, or devices) for which features are computed. Each entity is associated with one or more join keys, which are used to join feature data with model input data.

Entities and join keys are defined in Tecton using the Entity class.

Example:

from tecton import Entity

user = Entity(name="user", join_keys=["user_id"])

Attribute Features – Your Default Option​

Attribute features are single-value features. They are indexed by timestamp and join keys, and act as Tecton's Swiss-army knife:

  • Any transformation language: Spark SQL, Pandas, PySpark, or pure Python
  • Any data source: Batch, Stream, or Realtime
  • Any logic: simple casts, complex joins, third-party API callsβ€”if it runs in your own code, it runs here

In contrast to all other feature types, they are not specialized. Tecton simply executes your feature code and materializes the output.

Row-level transformation example

This example uses a BatchSource called users_batch with columns: user_id, age, and timestamp.

To generate a feature for user age:

from tecton import batch_feature_view, Attribute
from tecton.types import Int64


@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="pandas",
features=[
Attribute("age", Int64),
],
timestamp_field="timestamp",
)
def user_age(users_df):
return users_df[["user_id", "age", "timestamp"]]

SQL transformation example

This example uses a BatchSource called users_batch with columns: user_id, dob, and timestamp.

To generate a feature for user birth year using SQL:

from tecton import batch_feature_view, Attribute
from tecton.types import Int64


@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="spark_sql",
features=[
Attribute("birth_year", Int64),
],
timestamp_field="timestamp",
)
def user_birth_year(users_batch):
return f"""
SELECT
user_id,
YEAR(dob) AS birth_year,
timestamp
FROM
{users_batch}
"""

Python transformation example

This example uses a BatchSource called users_batch with columns: user_id, first_name, last_name, and timestamp.

To generate a feature for the user's full name:

from tecton import batch_feature_view, Attribute
from tecton.types import String


@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="pandas",
features=[
Attribute("full_name", String),
],
timestamp_field="timestamp",
)
def user_full_name(users_df):
users_df["full_name"] = users_df["first_name"] + " " + users_df["last_name"]
return users_df[["user_id", "full_name", "timestamp"]]

Aggregation Features​

Aggregation Features are computed by summarizing raw data over a specified time window or group (optimized for time window aggregation). Common examples include counts, sums, averages, and distinct counts over user activity or transactions.

Aggregation features are typically defined in BatchFeatureViews or StreamFeatureViews using the Aggregate class.

Example:

This example uses a BatchSource called transactions_batch with 3 columns: user_id, transaction_id, and timestamp.

To generate a 30-day transaction count aggregation feature for each user:

from tecton import batch_feature_view, Aggregate
from tecton.types import Field, Int64
from datetime import timedelta


@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="pandas",
aggregation_interval=timedelta(days=1),
features=[
Aggregate(
input_column=Field("transaction_id", Int64),
function="count",
time_window=timedelta(days=30),
name="user_transaction_count_30d",
)
],
timestamp_field="timestamp",
)
def user_transaction_aggregates(transactions_df):
return transactions_df[["user_id", "transaction_id", "timestamp"]]

Embeddings​

Embeddings are vector representations of complex data such as text, images, or categorical variables. They are often generated using deep learning models and are useful for capturing semantic meaning in a compact form.

Embeddings can be defined in both batch and stream feature views using the Embedding class.

Example:

This example uses a BatchSource called user_bio_source with 3 columns: user_id, timestamp, and bio_embedding.

To generate an embedding feature representing each user's bio as a 384-dimensional vector:

from tecton import batch_feature_view, Embedding
from tecton.types import Array, Float32


@batch_feature_view(
sources=[user_bio_source],
entities=[user],
mode="pandas",
features=[
Embedding("bio_embedding", Array(Float32, 384)),
],
)
def user_bio_embedding(user_bio_source):
# ... generate embedding ...
return user_bio_source[["user_id", "timestamp", "bio_embedding"]]

Model-Generated Features​

Model-generated features are features produced by running a machine learning model as part of the feature pipeline. These can include predictions, scores, or any output from a model that is useful as a feature for downstream models.

Model-generated features can be computed in transformation functions within Feature Views.

Example:

This feature view produces, for each user and message, both the original chat message and a sentiment score inferred by the "roberta-sentiment-v0" model, making these features available for both real-time and offline use.

@batch_feature_view(
name="sentiment_bfv",
mode="pandas",
sources=[chat_history_ds],
entities=[user],
batch_schedule=timedelta(days=1),
feature_start_time=datetime(2024, 5, 1),
timestamp_field="timestamp",
features=[
Attribute("message", String),
Inference(
input_columns=[
Field("message", String),
],
model="roberta-sentiment-v0",
name="user_sentiment",
),
],
online=True,
offline=True,
environment="tecton-rift-ml-1.0.0",
)
def user_message_sentiment(chat_history_ds):
return chat_history_ds[["message"]]

Sequence Features​

Sequence features represent chronological lists of recent events for each entity, such as logins, transactions, or user actions. They are especially useful for deep learning models like LSTMs and Transformers that learn patterns over time and require ordered, contextual input.

Sequence features in Tecton are implemented using the last_n aggregation function inside a stream_feature_view, materializing ordered lists of raw events for each entity in real time.

Example:

This example uses a StreamSource called user_events_stream with 3 columns: user_id, event_time, and event_name.

To generate a sequence feature containing the last 100 event names for each user within a 1-hour window:

from tecton import stream_feature_view, Aggregate
from tecton.aggregation_functions import last
from tecton.types import String, Field
from datetime import timedelta


@stream_feature_view(
source=user_events_stream,
entities=[user],
mode="spark_sql",
timestamp_field="event_time",
stream_processing_mode=StreamProcessingMode.CONTINUOUS,
features=[
Aggregate(
function=last(n=100),
input_column=Field("event_name", String),
time_window=timedelta(hours=1),
name="event_names",
)
],
)
def user_event_sequence(user_events_stream):
# ... transformation logic ...
return #

What's Next​

Once you have defined your features and entities in Tecton, here are the typical next steps:

  • Read more about Feature Views: Learn about the three types of feature views: Batch, Stream, and Realtime.
  • Deploy Feature Views: Promote your feature definitions to production workspaces and ensure they are available for model training and inference.
  • Materialize Features: Set up materialization schedules to compute and store your features in online and offline stores for training and serving.
  • Test Feature Pipelines: Validate your feature logic and data quality using Tecton's testing tools and best practices.
  • Monitor Feature Health: Track feature freshness, data quality, and operational metrics to ensure reliable feature delivery.

Was this page helpful?