Version: 1.2

Attribute Features

Attribute Features are the most flexible and universal feature type in Tecton. They allow you to define features using a variety of transformation languages, including Python, SQL, and Spark.

Overview

Attribute Features can be created using three main transformation approaches, each suited to different use cases:

Python transformation: Defined using Python or Pandas code, these features allow for flexible, programmatic transformations and are ideal for complex logic or leveraging external libraries.
SQL transformations: Defined using SQL queries, these features leverage the power of SQL engines (such as Spark SQL or Snowflake SQL) for efficient column selection, transformation, and conditional logic.
Row-Level transformations: These features focus on per-row operations such as filtering, projection, or simple transformations, and can be implemented using Python (Pandas), Spark SQL, or Snowflake SQL.

When to Use Attribute Features

Use Attribute Features when:

You want to define a feature directly from raw or filtered data without aggregation.
You need to apply row-level transformations such as filtering or conditional logic.
You want to use external libraries or more complex logic not suited to SQL-only pipelines.
You need flexibility across multiple compute engines and transformation types.

Example: Python String Manipulation

from tecton import batch_feature_view, Attribute
from tecton.types import String


@batch_feature_view(
    sources=[users_batch],
    entities=[user],
    mode="pandas",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("full_name", String),
    ],
    timestamp_field="timestamp",
)
def user_full_name(users_df):
    users_df["full_name"] = users_df["first_name"] + " " + users_df["last_name"]
    return users_df[["user_id", "full_name", "timestamp"]]

Example: SQL Simple Column Selection

from tecton import batch_feature_view, Attribute
from tecton.types import String


@batch_feature_view(
    sources=[users_batch],
    entities=[user],
    mode="spark_sql",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("email", String),
    ],
    timestamp_field="timestamp",
)
def user_email(users_batch):
    return f"""
        SELECT
            user_id,
            email,
            timestamp
        FROM {users_batch}
    """

Example: Row-Level Filtering with Python (Pandas)

from tecton import batch_feature_view, Attribute
from tecton.types import Float64


@batch_feature_view(
    sources=[transactions_batch],
    entities=[user],
    mode="pandas",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("amt", Float64),
    ],
    timestamp_field="timestamp",
)
def user_large_transactions(transactions_batch):
    # Only keep transactions with amount > 100
    filtered = transactions_batch[transactions_batch["amt"] > 100]
    return filtered[["user_id", "amt", "timestamp"]]

Modes and Compute Engines

In Tecton, the mode parameter in a Feature View definition specifies both the transformation language (such as Python, Pandas, or SQL) and determines which compute engine (Rift or Spark) will execute your transformation. You do not directly select the compute engine; instead, you select a mode, and Tecton uses the compute engine that supports that mode.

Requirements for Attribute Features

All Attribute Features must return:

Join keys for the relevant entities
A timestamp column (to support time travel and materialization)
The feature column(s) you defined

Ensure that your transformation logic preserves these columns in the final output.

Additional Example Transforms

Using Python or Pandas

Python transform Attribute Features use the mode="pandas" or mode="python" option in a Feature View. These features are defined by writing Python functions that transform input dataframes into the desired feature values.

Example: Numerical Transformation

from tecton import batch_feature_view, Attribute
from tecton.types import Int64


@batch_feature_view(
    sources=[users_batch],
    entities=[user],
    mode="pandas",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("age_squared", Int64),
    ],
    timestamp_field="timestamp",
)
def user_age_squared(users_df):
    users_df["age_squared"] = users_df["age"] ** 2
    return users_df[["user_id", "age_squared", "timestamp"]]

Example: Conditional Logic

from tecton import batch_feature_view, Attribute
from tecton.types import String


@batch_feature_view(
    sources=[users_batch],
    entities=[user],
    mode="pandas",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("age_group", String),
    ],
    timestamp_field="timestamp",
)
def user_age_group(users_df):
    users_df["age_group"] = users_df["age"].apply(lambda x: "adult" if x >= 18 else "minor")
    return users_df[["user_id", "age_group", "timestamp"]]

Example: Using External Libraries

from tecton import batch_feature_view, Attribute
from tecton.types import String
import hashlib


@batch_feature_view(
    sources=[users_batch],
    entities=[user],
    mode="pandas",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("email_hash", String),
    ],
    timestamp_field="timestamp",
)
def user_email_hash(users_df):
    users_df["email_hash"] = users_df["email"].apply(lambda x: hashlib.sha256(x.encode()).hexdigest())
    return users_df[["user_id", "email_hash", "timestamp"]]

Using SQL

SQL transform Attribute Features are Attribute Features defined using SQL queries. They allow you to express feature logic using SQL, leveraging the power and expressiveness of SQL engines such as Spark SQL and Snowflake SQL. These features are typically used for simple column selection, transformations, or conditional logic that can be efficiently expressed in SQL.

You define a SQL transform Attribute Feature by specifying the appropriate SQL mode in your Feature View (e.g., mode="spark_sql" for Spark compute engine, mode="snowflake_sql" for Rift compute engine), and returning a SQL query string from the transformation function.

Example: Conditional Logic (Spark SQL)

Spark
Rift (Snowflake SQL)

from tecton import batch_feature_view, Attribute


@batch_feature_view(
    sources=[credit_scores_batch],
    entities=[user],
    mode="spark_sql",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("user_has_good_credit", Int64),
    ],
    timestamp_field="date",
)
def user_has_good_credit(credit_scores):
    return f"""
        SELECT
            user_id,
            IF(credit_score > 670, 1, 0) as user_has_good_credit,
            date as timestamp
        FROM {credit_scores}
    """

from tecton import batch_feature_view, Attribute
from tecton.types import Int64


@batch_feature_view(
    sources=[credit_scores_batch],
    entities=[user],
    mode="snowflake_sql",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("user_has_good_credit", Int64),
    ],
    timestamp_field="date",
)
def user_has_good_credit(credit_scores):
    return f"""
        SELECT
            user_id,
            IFF(credit_score > 670, 1, 0) as user_has_good_credit,
            date as timestamp
        FROM {credit_scores}
    """

Example: SQL Attribute Feature with Join and Aggregation

Performs a join and aggregation, returning both the user's email address and their total transaction amount over the last 30 days.

Spark
Rift (Snowflake SQL)

from tecton import batch_feature_view, Attribute
from tecton.types import Int64, String


@batch_feature_view(
    sources=[transactions_batch, users_batch],
    entities=[user],
    mode="spark_sql",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("total_amt_last_30d", Int64),
        Attribute("user_email", String),
    ],
    timestamp_field="timestamp",
)
def user_total_amt_and_email(transactions_batch, users_batch):
    return f"""
        SELECT
            t.user_id,
            u.email AS user_email,
            SUM(t.amt) AS total_amt_last_30d,
            t.timestamp
        FROM {transactions_batch} t
        JOIN {users_batch} u
            ON t.user_id = u.user_id
        WHERE t.timestamp >= DATE_SUB(CURRENT_DATE(), 30)
        GROUP BY t.user_id, u.email, t.timestamp
    """

from tecton import batch_feature_view, Attribute
from tecton.types import Int64, String


@batch_feature_view(
    sources=[transactions_batch, users_batch],
    entities=[user],
    mode="snowflake_sql",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("total_amt_last_30d", Int64),
        Attribute("user_email", String),
    ],
    timestamp_field="timestamp",
)
def user_total_amt_and_email(transactions_batch, users_batch):
    return f"""
        SELECT
            t.user_id,
            u.email AS user_email,
            SUM(t.amt) AS total_amt_last_30d,
            t.timestamp
        FROM {transactions_batch} t
        JOIN {users_batch} u
            ON t.user_id = u.user_id
        WHERE t.timestamp >= DATEADD(day, -30, CURRENT_DATE())
        GROUP BY t.user_id, u.email, t.timestamp
    """

Notes

SQL transform Attribute Features are supported in Batch Feature Views and, for Spark SQL, also in Stream Feature Views.
The SQL query must return all required columns: join keys, timestamp, and feature columns.
The SQL dialect must match the compute engine (Spark SQL for Spark, Snowflake SQL for Rift).

Row-Level Transforms

Row-Level transform Attribute Features allow you to define features that are computed for each row of your input data, typically using filtering, projection, or transformation logic. These features are especially useful when you need to select, modify, or filter individual records before any aggregation or further processing.

Row-level transformations can be implemented using Python (Pandas), Spark SQL, or Snowflake SQL, depending on your compute engine and use case.

When to Use Row-Level Transformations

When you need to filter out records based on specific criteria.
When you want to project (select) or rename columns.
When you need to apply simple transformations to each row before aggregation.

Example: Row-Level Filtering with Spark SQL

from tecton import batch_feature_view, Attribute
from tecton.types import Float64


@batch_feature_view(
    sources=[transactions_batch],
    entities=[user],
    mode="spark_sql",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("amt", Float64),
    ],
    timestamp_field="timestamp",
)
def user_large_transactions(transactions_batch):
    return f"""
        SELECT
            user_id,
            amt,
            timestamp
        FROM {transactions_batch}
        WHERE amt > 100
    """

Example: Row-Level Transformation with Snowflake SQL

from tecton import batch_feature_view, Attribute
from tecton.types import Float64


@batch_feature_view(
    sources=[transactions_batch],
    entities=[user],
    mode="snowflake_sql",
    batch_schedule=timedelta(days=1),
    features=[
        Attribute("amt", Float64),
    ],
    timestamp_field="timestamp",
)
def user_large_transactions(transactions_batch):
    return f"""
        SELECT
            user_id,
            amt,
            timestamp
        FROM {transactions_batch}
        WHERE amt > 100
    """

Notes

Row-level transformations are supported in both Batch and Stream Feature Views (with supported modes).

Overview​

When to Use Attribute Features​

Example: Python String Manipulation​

Example: SQL Simple Column Selection​

Example: Row-Level Filtering with Python (Pandas)​

Modes and Compute Engines​

Requirements for Attribute Features​

Additional Example Transforms​

Using Python or Pandas​

Example: Numerical Transformation​

Example: Conditional Logic​

Example: Using External Libraries​

Using SQL​

Example: Conditional Logic (Spark SQL)​

Example: SQL Attribute Feature with Join and Aggregation​

Notes​

Row-Level Transforms​

When to Use Row-Level Transformations​

Example: Row-Level Filtering with Spark SQL​

Example: Row-Level Transformation with Snowflake SQL​

Notes​

Was this page helpful?

Overview

When to Use Attribute Features

Example: Python String Manipulation

Example: SQL Simple Column Selection

Example: Row-Level Filtering with Python (Pandas)

Modes and Compute Engines

Requirements for Attribute Features

Additional Example Transforms

Using Python or Pandas

Example: Numerical Transformation

Example: Conditional Logic

Example: Using External Libraries

Using SQL

Example: Conditional Logic (Spark SQL)

Example: SQL Attribute Feature with Join and Aggregation

Notes

Row-Level Transforms

When to Use Row-Level Transformations

Example: Row-Level Filtering with Spark SQL

Example: Row-Level Transformation with Snowflake SQL

Notes