Skip to main content
Version: 1.2

Attribute Features

Attribute Features are the most flexible and universal feature type in Tecton. They allow you to define features using a variety of transformation languages, including Python, SQL, and Spark.

Overviewโ€‹

Attribute Features can be created using three main transformation approaches, each suited to different use cases:

  • Python transformation: Defined using Python or Pandas code, these features allow for flexible, programmatic transformations and are ideal for complex logic or leveraging external libraries.
  • SQL transformations: Defined using SQL queries, these features leverage the power of SQL engines (such as Spark SQL or Snowflake SQL) for efficient column selection, transformation, and conditional logic.
  • Row-Level transformations: These features focus on per-row operations such as filtering, projection, or simple transformations, and can be implemented using Python (Pandas), Spark SQL, or Snowflake SQL.

When to Use Attribute Featuresโ€‹

Use Attribute Features when:

  • You want to define a feature directly from raw or filtered data without aggregation.
  • You need to apply row-level transformations such as filtering or conditional logic.
  • You want to use external libraries or more complex logic not suited to SQL-only pipelines.
  • You need flexibility across multiple compute engines and transformation types.

Example: Python String Manipulationโ€‹

from tecton import batch_feature_view, Attribute
from tecton.types import String


@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
features=[
Attribute("full_name", String),
],
timestamp_field="timestamp",
)
def user_full_name(users_df):
users_df["full_name"] = users_df["first_name"] + " " + users_df["last_name"]
return users_df[["user_id", "full_name", "timestamp"]]

Example: SQL Simple Column Selectionโ€‹

from tecton import batch_feature_view, Attribute
from tecton.types import String


@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
features=[
Attribute("email", String),
],
timestamp_field="timestamp",
)
def user_email(users_batch):
return f"""
SELECT
user_id,
email,
timestamp
FROM {users_batch}
"""

Example: Row-Level Filtering with Python (Pandas)โ€‹

from tecton import batch_feature_view, Attribute
from tecton.types import Float64


@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
features=[
Attribute("amt", Float64),
],
timestamp_field="timestamp",
)
def user_large_transactions(transactions_batch):
# Only keep transactions with amount > 100
filtered = transactions_batch[transactions_batch["amt"] > 100]
return filtered[["user_id", "amt", "timestamp"]]

Modes and Compute Enginesโ€‹

In Tecton, the mode parameter in a Feature View definition specifies both the transformation language (such as Python, Pandas, or SQL) and determines which compute engine (Rift or Spark) will execute your transformation. You do not directly select the compute engine; instead, you select a mode, and Tecton uses the compute engine that supports that mode.

Requirements for Attribute Featuresโ€‹

All Attribute Features must return:

  • Join keys for the relevant entities
  • A timestamp column (to support time travel and materialization)
  • The feature column(s) you defined

Ensure that your transformation logic preserves these columns in the final output.

Additional Example Transformsโ€‹

Using Python or Pandasโ€‹

Python transform Attribute Features use the mode="pandas" or mode="python" option in a Feature View. These features are defined by writing Python functions that transform input dataframes into the desired feature values.

Example: Numerical Transformationโ€‹

from tecton import batch_feature_view, Attribute
from tecton.types import Int64


@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
features=[
Attribute("age_squared", Int64),
],
timestamp_field="timestamp",
)
def user_age_squared(users_df):
users_df["age_squared"] = users_df["age"] ** 2
return users_df[["user_id", "age_squared", "timestamp"]]

Example: Conditional Logicโ€‹

from tecton import batch_feature_view, Attribute
from tecton.types import String


@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
features=[
Attribute("age_group", String),
],
timestamp_field="timestamp",
)
def user_age_group(users_df):
users_df["age_group"] = users_df["age"].apply(lambda x: "adult" if x >= 18 else "minor")
return users_df[["user_id", "age_group", "timestamp"]]

Example: Using External Librariesโ€‹

from tecton import batch_feature_view, Attribute
from tecton.types import String
import hashlib


@batch_feature_view(
sources=[users_batch],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
features=[
Attribute("email_hash", String),
],
timestamp_field="timestamp",
)
def user_email_hash(users_df):
users_df["email_hash"] = users_df["email"].apply(lambda x: hashlib.sha256(x.encode()).hexdigest())
return users_df[["user_id", "email_hash", "timestamp"]]

Using SQLโ€‹

SQL transform Attribute Features are Attribute Features defined using SQL queries. They allow you to express feature logic using SQL, leveraging the power and expressiveness of SQL engines such as Spark SQL and Snowflake SQL. These features are typically used for simple column selection, transformations, or conditional logic that can be efficiently expressed in SQL.

You define a SQL transform Attribute Feature by specifying the appropriate SQL mode in your Feature View (e.g., mode="spark_sql" for Spark compute engine, mode="snowflake_sql" for Rift compute engine), and returning a SQL query string from the transformation function.

Example: Conditional Logic (Spark SQL)โ€‹

from tecton import batch_feature_view, Attribute


@batch_feature_view(
sources=[credit_scores_batch],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
features=[
Attribute("user_has_good_credit", Int64),
],
timestamp_field="date",
)
def user_has_good_credit(credit_scores):
return f"""
SELECT
user_id,
IF(credit_score > 670, 1, 0) as user_has_good_credit,
date as timestamp
FROM {credit_scores}
"""

Example: SQL Attribute Feature with Join and Aggregationโ€‹

Performs a join and aggregation, returning both the user's email address and their total transaction amount over the last 30 days.

from tecton import batch_feature_view, Attribute
from tecton.types import Int64, String


@batch_feature_view(
sources=[transactions_batch, users_batch],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
features=[
Attribute("total_amt_last_30d", Int64),
Attribute("user_email", String),
],
timestamp_field="timestamp",
)
def user_total_amt_and_email(transactions_batch, users_batch):
return f"""
SELECT
t.user_id,
u.email AS user_email,
SUM(t.amt) AS total_amt_last_30d,
t.timestamp
FROM {transactions_batch} t
JOIN {users_batch} u
ON t.user_id = u.user_id
WHERE t.timestamp >= DATE_SUB(CURRENT_DATE(), 30)
GROUP BY t.user_id, u.email, t.timestamp
"""

Notesโ€‹

  • SQL transform Attribute Features are supported in Batch Feature Views and, for Spark SQL, also in Stream Feature Views.
  • The SQL query must return all required columns: join keys, timestamp, and feature columns.
  • The SQL dialect must match the compute engine (Spark SQL for Spark, Snowflake SQL for Rift).

Row-Level Transformsโ€‹

Row-Level transform Attribute Features allow you to define features that are computed for each row of your input data, typically using filtering, projection, or transformation logic. These features are especially useful when you need to select, modify, or filter individual records before any aggregation or further processing.

Row-level transformations can be implemented using Python (Pandas), Spark SQL, or Snowflake SQL, depending on your compute engine and use case.

When to Use Row-Level Transformationsโ€‹

  • When you need to filter out records based on specific criteria.
  • When you want to project (select) or rename columns.
  • When you need to apply simple transformations to each row before aggregation.

Example: Row-Level Filtering with Spark SQLโ€‹

from tecton import batch_feature_view, Attribute
from tecton.types import Float64


@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
features=[
Attribute("amt", Float64),
],
timestamp_field="timestamp",
)
def user_large_transactions(transactions_batch):
return f"""
SELECT
user_id,
amt,
timestamp
FROM {transactions_batch}
WHERE amt > 100
"""

Example: Row-Level Transformation with Snowflake SQLโ€‹

from tecton import batch_feature_view, Attribute
from tecton.types import Float64


@batch_feature_view(
sources=[transactions_batch],
entities=[user],
mode="snowflake_sql",
batch_schedule=timedelta(days=1),
features=[
Attribute("amt", Float64),
],
timestamp_field="timestamp",
)
def user_large_transactions(transactions_batch):
return f"""
SELECT
user_id,
amt,
timestamp
FROM {transactions_batch}
WHERE amt > 100
"""

Notesโ€‹

  • Row-level transformations are supported in both Batch and Stream Feature Views (with supported modes).

Was this page helpful?