This section covers how to define and manage features in Tecton using the feature engineering framework. Tecton allows you to define features in SQL, PySpark, SnowPark, or Python and subsequently handles the orchestration and maintenance of your data pipelines, including batch, streaming, and real-time pipelines.
Feature pipelines are composed using Tecton's Declarative framework:
The key concepts of Tecton's framework are:
- Data Sources: Data sources define a connection to a batch, stream, push, or request data source (i.e. request-time parameters) and are used as inputs to feature pipelines, known as "Feature Views" in Tecton.
- Feature Views: Feature Views take in data sources as inputs, or in some cases other Feature Views, and define a transformation to compute one or more features. Feature Views also provide Tecton with additional information such as metadata and orchestration, serving, and monitoring configurations. There are three types of Feature Views, each designed to support a common data flow pattern.
- Entities: An Entity is a business domain that can be modeled and that has features associated with it, e.g. User, Ad, Product, or Product Category. In Tecton, every Feature View is associated with one or more entities.
- Feature Services: A Feature Service represents a set of features that power a model. Typically, there is one Feature Service for each version of a model. Feature Services provide convenient endpoints for fetching training data through the Tecton SDK or fetching real-time feature vectors from Tecton's REST API.
In practice, composing pipelines with Tecton means connecting
Data Sources to
Feature Views to
Tecton objects are declared in Python. We recommend managing your source files using Git as a source of truth for your feature pipelines.