Skip to main content
Version: Beta 🚧

Feature Design Patterns

The Tecton Framework simplifies the implementation of machine learning features. This page aims to help translate your feature ideas into the Tecton framework by providing examples of common feature design patterns.

Features built on the Tecton platform typically fit into the following categories:

  • Dimension features are the single latest value of a field for an entity.
  • Aggregation features are metrics calculated by aggregating over a series of events.
  • Real-time features are calculated based on data that is available at request time. These features are similar to Dimension features in that they typically include simple projection and filtering, but cannot be pre-computed based on a batch or stream data source.
  • Derived features are advanced feature engineering techniques based on combining and post-processing basic features.

Dimension features

Dimension features are most commonly implemented as a Batch Feature View that performs projecting and filtering over the source table. If the dimension updates are available on a stream source, then a Stream Feature View will enable updating feature values more quickly.

Entity propertyA “property” of a single entity that is updated in place, commonly be derived from a dimension table.User Date of Birth
Upstream feature pipelinesFinal feature values that are calculated upstream of your Tecton Data Source. Even if the original feature calculation is more complex, once ingested to Tecton they are represented as a simple dimension feature.
ML Model OutputsOutputs of one model are commonly used as a feature for another. Each output of the model represents the latest value for the user.User Embeddings

Aggregation features

The Tecton Aggregation Engine makes it simple to develop and productionize Aggregation features.

Aggregations can be defined for either Batch or Stream Feature Views, depending on what source data you have available for your feature.

Time-windowed AggregationsAggregations, such as count distinct or mean, over events during a trailing time period, such as the last 2 hours or last 90 days.User Transaction Metrics
Lifetime AggregationsAggregations over the full data history.User Lifetime Transactions
Secondary Key AggregationsTime-windowed aggregations that are grouped over a secondary key in addition to the entity.User clicks per Ad ID
Event HistoryList of previous events for an entity. Most commonly used to build Derived features, as described below.List of page views

Real-time features

Tecton’s On-Demand Feature Views are simple python feature transformations to be executed in real-time based on data provided in the request context.

Request context transformationDerive features based analysis of the request payloadCountry of the transaction based on lat/long input
RulesApply heuristics to request dataUser Transaction Amount Is Above Threshold

Derived features

Derived features are calculated at request time based on features retrieved from multiple feature views, or including request context. On-Demand Feature Views can be used to calculate derived features because they can operate on request context data, as well as the outputs of Batch or Stream Feature Views.

See using Feature View dependencies in On-Demand Feature Views for how to combine multiple input feature views and request data in a single On-Demand Feature View.

Single EntityCombine features related to a single entity, but typically originating from different data sourcesUser historical click-through-rate (ad clicks / ad impressions)
Multiple EntityCombine features from separate entities to calculate relative comparisons or interactions between entities.Distance between sender and recipient
Request vs. Metric or DimensionCompare request context data to metric or dimension featuresUser age (current_time - date of birth)
FittedModel-specific transformation of other feature types in order to improve model performance. Because these features are fitted to the training data set for a specific model, they are typically implemented in model code rather than in the Tecton repository.One-hot encoding, Binning, Normalization

Optimizing costs for Multiple Entity Features

By calculating derived features at request time, Tecton helps avoid the cost of a combinatorial explosion in the entity space when derived features are calculated based on features about multiple entities.

Example 1: Distance between sender and recipient home address

Assume we have a dimension table that contains the lat/long of the home address for every user, and our application has 10 million registered users.

The naive implementation would do a full cross-join on the user dimension table, and calculate the distance between each pair of users. However this leads to needing to store 10 million ^2 = 100 trillion feature values! Computing and storing all these feature values is prohibitively expensive.

Instead build just one Feature View for the user home address, and then use an On-Demand Feature View to calculate the distance at request time.

Example 2: User - Product page views for recommendations

In the case of Search and Recommendation systems, Secondary Key Aggregations allow for more efficient feature caching and retrieval by storing interaction data under a single entity key.

The naive implementation would calculate the time-window aggregation under a [user, product] compound key. However when scoring 1000 candidate products for a recommendation, that requires querying the compound key 1000 times. Further, the key space becomes prohibitively large to cache.

With Secondary Key Aggregations, instead calculate the product view count for every product the user has viewed in the past. In most cases, the number of products here will be manageable for a single aggregation. Then join the Secondary Key Aggregation results to your individual product feature vectors.

Sample Feature Repositories

To see these design patterns in action, explore the full sample Tecton feature repositories.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon