Tecton Framework Overview
Tecton makes building operational ML data flows and consuming ML data as easy as possible. Tecton's Framework has two APIs: a Declarative API for composing data pipelines, and an interactive Read API to consume features.
Tecton's Read APIs are used to access feature values online for model serving or offline for model training.
The end-to-end example illustrated here can be found in our Github sample repo.
Defining Feature Pipelines
Tecton's framework is designed for you to express ML data flows. There are 5 important Tecton objects:
- Data Sources: Data sources define a connection to a batch, stream, or request data source (i.e. request-time parameters) and are used as inputs to feature pipelines, known as "Feature Views" in Tecton.
- Feature Views: Feature Views take in data sources as inputs, or in some cases other Feature Views, and define a pipeline of transformations to compute one or more features. Feature Views also provide Tecton with additional information such as metadata and orchestration, serving, and monitoring configurations. There are many types of Feature Views, each designed to support a common data flow pattern.
- Transformations: Each Feature View has a single pipeline of transformations that define the computation of one or more features. Transformations can be modularized and stitched together into a pipeline.
- Entities: An Entity is an object or concept that can be modeled and that has features associated with it. Examples include User, Ad, Product, and Product Category. In Tecton, every Feature View is associated with one or more entities.
- Feature Services: A Feature Service represents a set of features that power a model. Typically there is one Feature Service for each version of a model. Feature Services provide convenient endpoints for fetching training data through the Tecton SDK or fetching real-time feature vectors from Tecton's REST API.
In practice, composing pipelines with Tecton means connecting
Data Sources to
Feature Views to
Tecton objects are declared in Python. We recommend managing your source files using Git as a source of truth for your feature pipelines.
Operational Feature Pipelines
Once feature data pipelines are defined, Tecton orchestrates the operational tasks required to run these data pipelines and serve features. This includes:
- Materialization: orchestration of transformations and writing computed feature values to Tecton's online and offline stores
- Point-in-time-correctness: ensuring future signals do not inappropriately leak into training datasets to ensure the accuracy of the model training and avoiding data skew
- Monitoring: tracks your data flow pipelines and triggers alerts in case of any incidents.