Skip to main content
Version: 1.2

Entities and Join Keys

An Entity is a Tecton abstraction over a set of primary keys used for looking up feature data. An Entity represents a real-world "thing" that has data associated with it. Examples include Customer, Transaction, Product, and Product Category.

Entities are defined using the Entity class.

On the backend, an Entity's main purpose is to configure the join keys that will be used to retrieve Features for a feature view.

In Tecton, every Feature is associated with one or more Entities. For example:

  • A customer's lifetime transaction count is associated with just one Entity: Customer.
  • A lifetime transaction count of how many purchases a customer has made within a product category is associated with two entities: Customer and Product Category.

Entities provide a way to:

  • Organize Features. An Entity can belong to any number of Features, and a Feature can be associated with any number of Entities. A Feature associated with a Customer Entity, for example, can be described as being a feature derived from or of a Customer.
  • Prevent duplication. When creating an Entity, Feature Store users must agree on what to call it. For example, a commercial interaction with an e-commerce provider could be called a Transaction or a Purchase. Assume you decide on the term Transaction for this Entity. Once the Entity is created, all Features having to do with commercial interactions with an e-commerce provider must include the Transaction Entity.
  • Join Features that are associated with the same Entity. In Tecton, Entities have regularized keys to relate Features that are based on the same Entities (described below). Tecton associates these keys as attributes of an Entity and enforces their integrity.
  • Discover associated Features. Features that share Entities represent different information about that Entity. Use Tecton's Web UI to filter for Features of an Entity of interest.

Entity Workflowโ€‹

To use an entity:

  1. Define the entity and its join keys.
  2. Reference the entity in one or more Feature Views.
  3. Use matching join keys when fetching feature data for inference or training.
  4. Ensure all upstream and downstream schemas (Feature Views, request data, training sets) use the same types and key names.

Tecton uses the entity to join features with event data while preserving point-in-time correctness.

Basic Exampleโ€‹

user = Entity(name="user")

What's Nextโ€‹

Once you've defined your Entities and Join Keys, use them to power Tecton's core workflows:

  • Connect to Data Sources: Learn how to associate Entities with incoming data in Defining Feature Views
  • Understand Point-in-Time Joins: Read about Training Set Generation and how Entities support accurate joins
  • Handle Schema Changes Safely: Visit the Upgrade Guide for managing join key migration
  • Explore Related Concepts: See how Entities relate to Feature Services and feature reuse

How To Use Entities and Join Keysโ€‹

Define an Entityโ€‹

Define an Entity using the Entity class. Entity objects are configured by the following attributes:

  • name is a unique identifier for the Entity class. For example: Customer or Transaction.
  • join_keys are the names of the primary key columns that uniquely identify an Entity instance. All Features that share an Entity identify that Entity instance using the same primary key column(s). For example, if column user_id identifies a Customer, then all Features derived from Customer refer to customers with a user_id primary key.

More information can be found in the Entity API reference.

Example: Entities with One Join Keyโ€‹

This example defines two Entities, Customer and Transaction:

customer = Entity(name="Customer", join_keys=[Field("customer_id", String)])

transaction = Entity(name="Transaction", join_keys=[Field("transaction_id", String)])

The Entity object defines the join_key columns. This definition is independent of a data source definition or Feature transformation.

You might need to define two different Features calculated from two raw data sources that use different primary key column names. This is no problem, and is in fact one of the reasons to use Entities. For example, assume one raw data source uses customer_id as the primary key column and another uses customer_identifier. You could unify those to a single key in your transformations e.g. SELECT customer_identifier as customer_id. Both Features are associated with the Customer Entity, and both use the same Customer join key to search Customer instances.

Example: Entity with Two Join Keysโ€‹

This example defines one Entity, Company Employee, defined by two join keys:

employee = Entity(name="Company Employee", join_keys=[Field("company_id", String), Field("employee_id", String)])

When you define an Entity with two join keys, the Entity is uniquely defined by both keys. The feature associated with the Company Employee Entity will then use both company_id and employee_id to search for Company Employee instances.

Using Entities in Feature Viewsโ€‹

In this example, my_feature_view defines Features derived from Transactions, so it must include the transaction Entity in its definition.

transaction = Entity(name="Transaction", join_keys=[Field("transaction_id", String)])


@batch_feature_view(
entities=[transaction],
# ...
)
def my_feature_view(input_data):
pass

Join Keys in Feature Servicesโ€‹

When you define a Feature Service that includes multiple feature views, the feature services will need the join keys for all entities in the feature views to query.

For example you might have two entities:

customer = Entity(name="Customer", join_keys=[Field("customer_id", String)])

transaction = Entity(name="Transaction", join_keys=[Field("transaction_id", String)])

You might define two feature views: customer_feature_view with entities = [customer] and customer_transaction_feature_view with entities = [customer, transaction]. You can then define a feature service that includes both of these feature views:

fs = FeatureService(name="my_fs", features=[customer_feature_view, customer_transaction_feature_view])

Your feature service will need the union of all join keys from the entities in the feature views it includes. In this case, the feature service will need both customer_id and transaction_id to query.

Additionally, you can rebind join keys in a feature service definition as shown in the example below:

transaction_fraud_service = FeatureService(
name="transaction_fraud_service",
features=[
user_features.with_name("sender_features").with_join_key_map({"user_id": "sender_id"}),
user_features.with_name("recipient_features").with_join_key_map({"user_id": "recipient_id"}),
],
)

This feature service can look up different entities of the same User Entity type so that you can query with different join keys: sender_id and recipient_id. You can now get user features for both the sender and recipient of a transaction.

Was this page helpful?