Tecton Concepts and Frameworks
Tecton makes building operational ML data flows and consuming ML data as easy as possible. Using Tecton involves two main sets of APIs, one for composing data flows, and the other for consuming data.
Declarative Framework APIsComposing your feature pipelines using Tecton objects like Feature Views, Data Sources and Feature Services using Tecton's declarative framework.
Read APIsAccessing feature values through Tecton's read APIs for online serving or offline model training.
A minimal end-to-end example is illustrate here, and can be found on our sample repo in Github.
Defining Data Flows with Tecton's Framework
Tecton's definitions framework is designed for you to express ML data flows. There are 5 main Tecton objects.
- Data Sources: Data sources define a connection to a batch, stream, or request data source (i.e. request-time parameters) and are used as inputs to feature pipelines, known as "Feature Views" in Tecton.
- Feature Views: Feature Views take in data sources as inputs, or in some cases other Feature Views, and define a pipeline of transformations to compute one or more features. Feature Views also provide Tecton with additional information such as metadata and orchestration, serving, and monitoring configurations. There are many types of Feature Views, each designed to support a common data flow pattern.
- Transformations: Each Feature View has a single pipeline of transformations that define the computation of one or more features. Transformations can be modularized and stitched together into a pipeline.
- Entities: An Entity is an object or concept that can be modeled and that has features associated with it. Examples include User, Ad, Product, and Product Category. In Tecton, every Feature View is associated with one or more entities.
- Feature Services: A Feature Service represents a set of features that power a model. Typically there is one Feature Service for each version of a model. Feature Services provide convenient endpoints for fetching training data through the Tecton SDK or fetching real-time feature vectors from Tecton's REST API.
In practise, composing pipelines with Tecton means connecting
Data Sources to
Feature Views to
All of these objects are declared in Python. We recommend managing your source files using Git, as it will be the source of truth for the Feature Store we spin up on your behalf.
From Definitions to Operations
With your feature data flows properly defined, Tecton takes care of all of the operational concerns involved in actually running these data flows and serving the features, including:
- Materialization: orchestration of all transformations, and saving computed feature values in Tecton's online and offline stores
- Low latency serving: orchestrating feature computation and caching to minimize serving latency
- Point-in-time-correctness: ensuring future signal does not inappropriately leak into training datasets, thus ensuring the accuracy of the trained model and avoiding data skew
- Monitoring: showing you the status of your data flow pipelines and alerting any upstream outages.
Consuming Feature Data through Feature Service Endpoints
Depending on the usage scenario, you will use different parts of the consumption API for fetching feature data.
- Serving Online (Guide) – when your application needs to get up-to-date feature values in real time in production.
api/v1/feature-service/get-featuresas a REST API request
get_feature_vectorfrom a Feature Service (via the Python SDK)
- Training Offline (Guide) - when you need to training models with historical data
get_historical_featuresfrom a Feature Service (via the Python SDK)
That's the core of it. Compose your data flows with the Definitions framework, then read feature values from the Feature Store we spin up on your behalf.