An entity is an object or concept that can be modeled and that has features associated with it. Examples include Customer, Transaction, Product, and Product Category.
In Tecton, every Feature is associated with one or more entities. For example:
- A customer's lifetime transaction count is associated with just one Entity: Customer.
- A lifetime transaction count of how many purchases a customer has made within a product category is associated with two entities: Customer and Product Category.
Entity vs. Entity Instance. Entity describes the entire business domain (for example Customer). Entity Instance describes a concrete instance, such as "Alice Smith with the internal user_id of abcd1234".
Entities provide a way to:
- Organize Features. An Entity can belong to any number of Features, and a Feature can be associated with any number of Entities. A Feature associated with a Customer Entity, for example, can be described as being a feature derived from or of a Customer.
- Prevent duplication. When creating an Entity, Feature Store users must agree on what to call it. For example, a commercial interaction with an e-commerce provider could be called a Transaction or a Purchase. Assume you decide on the term Transaction for this Entity. Once the Entity is created, all Features having to do with commercial interactions with an e-commerce provider must include the Transaction Entity.
- Join Features that are associated with the same Entity. In Tecton, Entities have regularized keys to relate Features that are based on the same Entities (described below). Tecton stores these keys as attributes of an Entity and enforces their integrity.
- Discover associated Features. Features that share Entities represent different information about that Entity. Use Tecton's Web UI to filter for Features of an Entity of interest.
Defining an Entity
Define an Entity using the
Attributes of an Entity
Entity objects are defined by the following attributes:
nameis a unique identifier for the Entity class. For example: Customer or Transaction.
join_keysare the names of primary key columns that uniquely identify an
Entityinstance. All Features that share an Entity identify that Entity Instance using the same primary key column. For example if
user_ididentifies a Customer, then all Features derived from Customer refer to customers with a
See the Entity reference for detailed descriptions of VDS attributes.
Example: Creating Customer and Transaction Entities
This sample code defines two Entities, Customer and Transaction:
customer = Entity( name="Customer", join_keys=["customer_id"] ) transaction = Entity( name="Transaction", join_keys=["transaction_id"] )
Entity object defines the
join_key columns. This definition is independent of a data source or Feature transformation.
You might need to define two different Features calculated from two raw data sources that use different primary key columns. This is no problem, and is in fact one of the reasons to use Entities. For example, assume one raw data source uses
customer_id as the primary key column and another uses
customer_identifier. (You could name the join key either of these, or something altogether different. Assume you name the join key
cust_id.) Both Features are associated with the Customer Entity, and both use the Customer join key to search Customer instances.
See Transformations for more about Transformations.
Using an Entity
As discussed above, include Entities when you define a Feature. For example:
@batch_feature_view( mode='spark_sql', entities=[e.transaction], ... ) def my_feature_view(input_data): ...
In this example,
my_feature defines Features derived from Transactions, so it includes the
transaction Entity in its definition.
Any other Feature View having to do with Transactions must also include the
transaction Entity, regardless of the raw data source from which the transaction data originates.