Skip to main content
Version: 1.0

Tecton Concepts

Overview​

Tecton is a feature platform that helps you build, deploy, and manage machine learning (ML) features at scale. With Tecton, you can transform raw data into features, embeddings, and prompts for AI applications—all within a single, consistent framework. Whether you need batch features for training or real-time feature updates for online prediction, Tecton simplifies feature engineering and production operations so you can focus on building and deploying your ML models.

At the heart of every Tecton pipeline is the Feature View.

Feature Views: The Core of Tecton​

A Feature View is Tecton's primary building block. It defines how raw data is transformed into ML-ready features. With just a few configuration parameters and a single CLI command (tecton apply), Tecton orchestrates everything from backfilling historical data to managing live pipelines for real-time inference. Key built-in capabilities include:

  • Backfilling historical data (to fill offline stores and enable point-in-time correct training sets)
  • Orchestration of scheduling and job management
  • Performance optimizations like incremental updates
  • Monitoring for data quality and pipeline health

An example Feature View definition for batch data:

@batch_feature_view(
sources=[my_batch_source],
entities=[user],
mode="pandas",
timestamp_field="timestamp",
features=[
Attribute(name="date_of_birth", dtype=Timestamp),
Attribute("amt", Float64),
],
...,
)
def user_features(my_batch_source):
return my_batch_source[["user_id", "timestamp", "date_of_birth", "amd"]]

There are three types of Feature Views:

  1. Batch Feature Views transform offline data from batch sources such as data lakes, data warehouses, Hive, etc. at scheduled intervals, publishing features to the online and/or offline stores.
  2. Stream Feature Views transform data from Stream Sources (like Kafka or Kinesis) in near-real-time and publish features to the online store. They also run the same transformations offline using the Stream Source's historical log for backfilling, testing, or materialization to the offline store.
  3. Realtime Feature Views compute features at request time from real-time sources or other Feature Views, enabling on-the-fly feature computation when pre-computation isn't feasible.

Feature View Components​

Feature Views are composed of several elements:

Sources​

Feature Views read data from Data Sources, which encapsulate how Tecton accesses your raw data (e.g., a Hive table, a Kafka topic, or a Kinesis stream). Tecton automatically interprets and applies the same data structure (e.g., field names, types, and organization) to your raw data regardless of whether it's being accessed in an offline batch context or in an online real-time serving context. This ensures that the features you compute behave the same way in both environments, eliminating potential mismatches or errors that could arise if the data was interpreted differently.

transactions_stream = StreamSource(
name="transactions_stream",
# online store
stream_config=KinesisConfig(stream_name="<stream name>", timestamp_field="timestamp"),
# offline store
batch_config=HiveConfig(database="demo_fraud_v2", table="transactions"),
)

Entities​

Entities serve two key functions: they define the primary keys needed for feature retrieval, and serve as a way to model features around common concepts (e.g., user, merchant, product).

user = Entity(name="user", join_keys=[Field("user_id", String)])

Features​

Features are the data inputs you feed into your machine learning models to train them and make predictions. They're the building blocks that your model learns from. Feature Views support multiple feature types:

  • Attribute: Direct column values from transformed data.
Attribute(name="date_of_birth", input_column=Field("date_of_birth", Timestamp))
  • Aggregate: Tecton aggregations make it easy to create features that summarize data over time. They handle the complexity of calculating these summaries accurately and efficiently, ensuring your models get fresh and consistent data whether you're training them offline or making predictions online. This means less coding and more reliable results.
Aggregate(
name="7_day_transaction_average",
input_column=Field("amount", Int64),
function="mean",
time_window=TimeWindow(window_size=timedelta(days=7)),
)
Embedding(
name="text_embedding",
input_column=Field("description", String),
model="sentence-transformers/all-MiniLM-L6-v2",
)
  • Inference: Uses an ML model to compute features from input columns.
Inference(
input_columns=[Field("feature1", String), Field("feature2", Int64)],
model="my_custom_model",
name="inference_feature",
)
  • Calculation: Efficient row-wise transformations.
Calculation(name="is_active_customer", expr="transactions.recent_transaction_count > 0")

Timestamp​

The time field that Tecton uses for time travel and to calculate time-windowed aggregate features. The Timestamp field helps ensure that features are computed based on the correct data snapshots.

Transformation Function​

Feature Views support flexible transformation functions using Python, Pandas, SQL, PySpark, and Snowflake SQL. These transformations are applied to the raw data before features are computed.

@batch_feature_view(
sources=[user_info],
entities=[user],
mode="pandas",
timestamp_field="timestamp",
features=[Attribute(name="date_of_birth", dtype=Timestamp)],
)
def user_features(user_info):
return user_info[["user_id", "timestamp", "date_of_birth"]]

Next we’ll serve those features to models via Feature Services.

Feature Services​

Feature Services group features from Feature Views together for training and serving. They provide high performance online feature serving at scale.

recommendation_feature_service = FeatureService(
name="recommendation_feature_service", features=[user_features, product_features]
)

Feature Services handle:

  • Online Feature Retrieval: Low-latency access to features for inference.
  • Batch Training Datasets: Point-in-time correct feature joins for training.
  • Feature Logging: Track feature values for monitoring and debugging.

Now that we've established how features are defined, orchestrated, and served, let's explore how Workspaces enable you to manage these resources across different environments—from development sandboxes to production-ready pipelines.

Workspaces​

Workspaces are isolated environments for managing, configuring, and deploying Tecton feature repositories. Teams can create, clone, deploy, and even destroy workspaces independently, enabling rapid development cycles within a single Tecton deployment.

When you run the tecton apply command—either locally via the Tecton CLI or through a CI/CD pipeline—Tecton reads the Object Definitions from your Tecton Repository and updates the Workspace Configuration. This Workspace Configuration represents the live set of feature pipelines, materializations, and services active in the environment.

Types of Workspaces​

Workspaces are either Live or Development:

  1. Live Workspaces
    • Purpose: Production and staging environments.
    • Capabilities: In a Live Workspace, Tecton generates real-time endpoints to serve features in production. Your feature definitions will be fully materialized (online and/or offline) based on the materialization configuration (online=True and/or offline=True) in your Feature Views.
    • Considerations: Because they connect to real production data and infrastructure, Live Workspaces can incur significant resource costs. They’re meant for high-availability, low-latency feature serving.
  2. Development Workspaces
    • Purpose: Development and testing without incurring heavy infrastructure usage.
    • Capabilities: Development Workspaces do not connect to the production environment and do not automatically materialize data. You can still discover, fetch, and test features or embeddings (for example, in a Jupyter notebook) without generating real-time or production-grade pipelines.
    • Considerations: They are cost-effective environments for iterative prototyping and debugging before moving features to a staging or production (Live) Workspace.

With this understanding of Tecton and how Workspaces function, you'll be better equipped to design your feature pipelines, iterate on them in a Development Workspace, and then confidently promote them to a Live Workspace for production use.

Next Steps: Build a Pipeline​

You've seen how Tecton structures feature engineering—from defining Feature Views and Entities to testing data flows and serving features online.

Here's how you can get started building your first pipeline:

  1. Set up a Tecton Workspace (development or live).
  2. Set up your development environment.
  3. Define your Data Sources and Entities in Python.
  4. Create Feature Views (batch, stream, or realtime) with your transformations.
  5. Apply your changes (tecton apply) to orchestrate pipelines.
  6. Build Feature Services to serve features to your models in production.

Was this page helpful?