Skip to main content
Version: 1.2

Feature Design Patterns

The Tecton Framework simplifies the implementation of machine learning features. This page aims to help translate your feature ideas into the Tecton framework by providing examples of common feature design patterns.

Features built on the Tecton platform typically fit into the following categories:

  • Dimension features are the single latest value of a field for an entity.
  • Aggregation features are metrics calculated by aggregating over a series of events.
  • Real-time features are calculated based on data that is available at request time. These features are similar to Dimension features in that they typically include simple projection and filtering, but cannot be pre-computed based on a batch or stream data source.
  • Derived features are advanced feature engineering techniques based on combining and post-processing basic features.

Dimension featuresโ€‹

Dimension features are most commonly implemented as a Batch Feature View that performs projecting and filtering over the source table. If the dimension updates are available on a stream source, then a Stream Feature View will enable updating feature values more quickly.

PatternDescriptionExample
Entity propertyA "property" of a single entity that is updated in place, commonly be derived from a dimension table.User Date of Birth
Upstream feature pipelinesFinal feature values that are calculated upstream of your Tecton Data Source. Even if the original feature calculation is more complex, once ingested to Tecton they are represented as a simple dimension feature.
ML Model OutputsOutputs of one model are commonly used as a feature for another. Each output of the model represents the latest value for the user.User Embeddings

Aggregation featuresโ€‹

The Tecton Aggregation Engine makes it simple to develop and productionize Aggregation features.

Aggregations can be defined for either Batch or Stream Feature Views, depending on what source data you have available for your feature.

PatternDescriptionExample
Time-windowed AggregationsAggregations, such as count distinct or mean, over events during a trailing time period, such as the last 2 hours or last 90 days.User Transaction Metrics
Lifetime AggregationsAggregations over the full data history.User Lifetime Transactions
Secondary Key AggregationsTime-windowed aggregations that are grouped over a secondary key in addition to the entity.User clicks per Ad ID
Event HistoryList of previous events for an entity. Most commonly used to build Derived features, as described below.List of page views

Real-time featuresโ€‹

Real-time features are computed at inference time, enabling models to combine signals from materialized feature views with request-time data. You should use these when:

  • You need to incorporate request data that isn't available ahead of time (e.g., transaction amount, device ID).
  • You must compare live inputs (e.g., current transaction amount) against historical patterns (e.g., user's average transaction amount).
  • Precomputing all possibilities would be impractical or impossible.

When designing real-time features, it's important to distinguish:

AspectDescription
InputsReal-time features can use:
- Real-time context: Context metadata (timestamp).
- Materialized Batch or Stream Feature View data: Precomputed online or offline values retrieved at inference time (e.g., user 30-day average spend).
- Request sources: External APIs or services providing real-time signals.
Processing optionsReal-time features can be implemented using one of three modes:
- Calculation: Use Tecton's built-in SQL-like operators to define simple operations.
- Python mode: Write custom lightweight functions in Python.
- Pandas mode: Write batch-style transformations using Pandas DataFrames.
Choose Calculation when you can express the logic using Tecton's built-in operators for better performance. Choose Python or Pandas when you need full flexibility.

Choosing the right design:

  • Use request source if your feature depends solely on request-time inputs.
  • Combine request context with materialized feature data if comparing live data to historical aggregates.
  • Use Calculation mode when the transformation is a simple operation that fits Tecton's built-in functions.
  • Use Python mode for simple row-by-row transformations.
  • Use Pandas mode when you need batch operations across multiple fields or complex DataFrame logic.

Common real-time feature patterns:

PatternDescriptionExample
Request-derived featuresDerive features from request payload fieldsCountry of the transaction based on lat/long input
Rule-based featuresApply heuristics or threshold-based logicUser Transaction Amount Is Above Threshold

Derived featuresโ€‹

Derived features are calculated at request time based on features retrieved from multiple feature views, or including request context. Realtime Feature Views can be used to calculate derived features because they can operate on request context data, as well as the outputs of Batch or Stream Feature Views.

See using Feature View dependencies in Realtime Feature Views for how to combine multiple input feature views and request data in a single Realtime Feature View.

PatternDescriptionExample
Single EntityCombine features related to a single entity, but typically originating from different data sourcesUser historical click-through-rate (ad clicks / ad impressions)
Multiple EntityCombine features from separate entities to calculate relative comparisons or interactions between entities.Distance between sender and recipient
Request vs. Metric or DimensionCompare request context data to metric or dimension featuresUser age (current_time - date of birth)
FittedModel-specific transformation of other feature types in order to improve model performance. Because these features are fitted to the training data set for a specific model, they are typically implemented in model code rather than in the Tecton repository.One-hot encoding, Binning, Normalization

Optimizing costs for Multiple Entity Featuresโ€‹

By calculating derived features at request time, Tecton helps avoid the cost of a combinatorial explosion in the entity space when derived features are calculated based on features about multiple entities.

Example 1: Distance between sender and recipient home addressโ€‹

Assume we have a dimension table that contains the lat/long of the home address for every user, and our application has 10 million registered users.

The naive implementation would do a full cross-join on the user dimension table, and calculate the distance between each pair of users. However this leads to needing to store 10 million ^2 = 100 trillion feature values! Computing and storing all these feature values is prohibitively expensive.

Instead build just one Feature View for the user home address, and then use an Realtime Feature View to calculate the distance at request time.

Example 2: User - Product page views for recommendationsโ€‹

In the case of Search and Recommendation systems, Secondary Key Aggregations allow for more efficient feature caching and retrieval by storing interaction data under a single entity key.

The naive implementation would calculate the time-window aggregation under a [user, product] compound key. However when scoring 1000 candidate products for a recommendation, that requires querying the compound key 1000 times. Further, the key space becomes prohibitively large to cache.

With Secondary Key Aggregations, instead calculate the product view count for every product the user has viewed in the past. In most cases, the number of products here will be manageable for a single aggregation. Then join the Secondary Key Aggregation results to your individual product feature vectors.

Sample Feature Repositoriesโ€‹

To see these design patterns in action for both Rift and Spark, explore the full sample Tecton feature repositories.

Was this page helpful?