Feature Types
Tecton provides a flexible framework for defining features that power ML or rule-based decisioning systems.
Tecton supports two tiers of feature definitions:
-
Attribute features β the universal, do-anything building block
Use an Attribute feature view whenever you can already express your transformation in Python, Spark, Pandas, or SQL. It works with any source and any logic.
-
Specialized feature templates β tuned for common patterns
- Time-window Aggregations β rolling counts, sums, averages, etc.
- Embeddings β vector outputs from Embedding models.
- Model Generated Features - structured outputs from PyTorch models.
- Sequence features β ordered event lists.
These templates deliver performance, storage and compute optimizations so you don't have to hand-roll them.
Entities and Join Keysβ
Entities represent the primary objects (such as users, items, or devices) for which features are computed. Each entity is associated with one or more join keys, which are used to join feature data with model input data.
Entities and join keys are defined in Tecton using the
Entity class.
Example:
from tecton import Entity
user = Entity(name="user", join_keys=["user_id"])
Attribute Features β Your Default Optionβ
Attribute features are single-value features. They are indexed by timestamp and join keys, and act as Tecton's Swiss-army knife:
- Any transformation language: Spark SQL, Pandas, PySpark, or pure Python
- Any data source: Batch, Stream, or Realtime
- Any logic: simple casts, complex joins, third-party API callsβif it runs in your own code, it runs here
In contrast to all other feature types, they are not specialized. Tecton simply executes your feature code and materializes the output.
Row-level transformation example
This example uses a BatchSource called users_batch with columns: user_id,
age, and timestamp.
To generate a feature for user age:
from tecton import batch_feature_view, Attribute
from tecton.types import Int64
@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="pandas",
features=[
Attribute("age", Int64),
],
timestamp_field="timestamp",
)
def user_age(users_df):
return users_df[["user_id", "age", "timestamp"]]
SQL transformation example
This example uses a BatchSource called users_batch with columns: user_id,
dob, and timestamp.
To generate a feature for user birth year using SQL:
from tecton import batch_feature_view, Attribute
from tecton.types import Int64
@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="spark_sql",
features=[
Attribute("birth_year", Int64),
],
timestamp_field="timestamp",
)
def user_birth_year(users_batch):
return f"""
SELECT
user_id,
YEAR(dob) AS birth_year,
timestamp
FROM
{users_batch}
"""
Python transformation example
This example uses a BatchSource called users_batch with columns: user_id,
first_name, last_name, and timestamp.
To generate a feature for the user's full name:
from tecton import batch_feature_view, Attribute
from tecton.types import String
@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="pandas",
features=[
Attribute("full_name", String),
],
timestamp_field="timestamp",
)
def user_full_name(users_df):
users_df["full_name"] = users_df["first_name"] + " " + users_df["last_name"]
return users_df[["user_id", "full_name", "timestamp"]]
Aggregation Featuresβ
Aggregation Features are computed by summarizing raw data over a specified time window or group (optimized for time window aggregation). Common examples include counts, sums, averages, and distinct counts over user activity or transactions.
Aggregation features are typically defined in BatchFeatureViews or
StreamFeatureViews using the
Aggregate class.
Example:
This example uses a BatchSource called transactions_batch with 3 columns:
user_id, transaction_id, and timestamp.
To generate a 30-day transaction count aggregation feature for each user:
from tecton import batch_feature_view, Aggregate
from tecton.types import Field, Int64
from datetime import timedelta
@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="pandas",
aggregation_interval=timedelta(days=1),
features=[
Aggregate(
input_column=Field("transaction_id", Int64),
function="count",
time_window=timedelta(days=30),
name="user_transaction_count_30d",
)
],
timestamp_field="timestamp",
)
def user_transaction_aggregates(transactions_df):
return transactions_df[["user_id", "transaction_id", "timestamp"]]
Embeddingsβ
Embeddings are vector representations of complex data such as text, images, or categorical variables. They are often generated using deep learning models and are useful for capturing semantic meaning in a compact form.
Embeddings can be defined in both batch and stream feature views using the
Embedding class.
Example:
This example uses a BatchSource called user_bio_source with 3 columns:
user_id, timestamp, and bio_embedding.
To generate an embedding feature representing each user's bio as a 384-dimensional vector:
from tecton import batch_feature_view, Embedding
from tecton.types import Array, Float32
@batch_feature_view(
sources=[user_bio_source],
entities=[user],
mode="pandas",
features=[
Embedding("bio_embedding", Array(Float32, 384)),
],
)
def user_bio_embedding(user_bio_source):
# ... generate embedding ...
return user_bio_source[["user_id", "timestamp", "bio_embedding"]]
Model-Generated Featuresβ
Model-generated features are features produced by running a machine learning model as part of the feature pipeline. These can include predictions, scores, or any output from a model that is useful as a feature for downstream models.
Model-generated features can be computed in transformation functions within Feature Views.
Example:
This feature view produces, for each user and message, both the original chat message and a sentiment score inferred by the "roberta-sentiment-v0" model, making these features available for both real-time and offline use.
@batch_feature_view(
name="sentiment_bfv",
mode="pandas",
sources=[chat_history_ds],
entities=[user],
batch_schedule=timedelta(days=1),
feature_start_time=datetime(2024, 5, 1),
timestamp_field="timestamp",
features=[
Attribute("message", String),
Inference(
input_columns=[
Field("message", String),
],
model="roberta-sentiment-v0",
name="user_sentiment",
),
],
online=True,
offline=True,
environment="tecton-rift-ml-1.0.0",
)
def user_message_sentiment(chat_history_ds):
return chat_history_ds[["message"]]
Sequence Featuresβ
Sequence features represent chronological lists of recent events for each entity, such as logins, transactions, or user actions. They are especially useful for deep learning models like LSTMs and Transformers that learn patterns over time and require ordered, contextual input.
Sequence features in Tecton are implemented using the last_n aggregation
function inside a stream_feature_view, materializing ordered lists of raw events
for each entity in real time.
Example:
This example uses a StreamSource called user_events_stream with 3 columns:
user_id, event_time, and event_name.
To generate a sequence feature containing the last 100 event names for each user within a 1-hour window:
from tecton import stream_feature_view, Aggregate
from tecton.aggregation_functions import last
from tecton.types import String, Field
from datetime import timedelta
@stream_feature_view(
source=user_events_stream,
entities=[user],
mode="spark_sql",
timestamp_field="event_time",
stream_processing_mode=StreamProcessingMode.CONTINUOUS,
features=[
Aggregate(
function=last(n=100),
input_column=Field("event_name", String),
time_window=timedelta(hours=1),
name="event_names",
)
],
)
def user_event_sequence(user_events_stream):
# ... transformation logic ...
return #
What's Nextβ
Once you have defined your features and entities in Tecton, here are the typical next steps:
- Read more about Feature Views: Learn about the three types of feature views: Batch, Stream, and Realtime.
- Deploy Feature Views: Promote your feature definitions to production workspaces and ensure they are available for model training and inference.
- Materialize Features: Set up materialization schedules to compute and store your features in online and offline stores for training and serving.
- Test Feature Pipelines: Validate your feature logic and data quality using Tecton's testing tools and best practices.
- Monitor Feature Health: Track feature freshness, data quality, and operational metrics to ensure reliable feature delivery.