1.0.0

September 17, 2024

With Tecton 1.0, we’re excited to announce new capabilities along two major fronts:

GenAI. We are applying our 5+ years of experience helping enterprises run their AI applications in production to the GenAI world. With Tecton 1.0, you can now manage, enrich and serve your prompts in production, cost efficiently generate, store and serve embeddings, and provide your LLM with additional context in the form of features as tools and knowledge.

Core platform. We are continuing to evolve our core platform, improving performance and cost efficiency at scale. New capabilities include Remote Dataset Generation, Compaction for Streaming Time Window Aggregation features, and improved controls for realtime compute and feature serving infrastructure. 1.0 also includes capabilities such as Plan Integration tests to further simplify our developer experience; or Model Generated Features to allow our customers to make the most of all of their data. And there’s much more!

To get started with Tecton 1.0, please refer to the upgrade guide.

Embeddings

Tecton enables Modelers to seamlessly manage Embedding generation, storage, and serving. Tecton solves the most challenging aspects of productionizing embeddings with out-of-the-box capabilities: top open source embeddings models, support for proprietary embeddings models, optimized compute resource management, data pipeline orchestration, efficient, scalable storage and retrieval, easy, reproducible model experimentation, and more.

Prompt Lifecycle Management

Tecton's built-in Prompt Management brings standardization, version control, and DevOps best practices to mission-critical prompts. It enables organizations to discover, share, and safely reuse prompts across use cases.

Prompts alone aren't enough to create great LLM responses. Prompts need context in the form of features in order to provide the best responses. Tight integration between classical Tecton features and prompts makes it easier to enrich prompts with great context, and even generate historical versions of prompts to aid in fine tuning large language models.

This advancement not only enhances the quality and consistency of AI-generated content but also accelerates the development and deployment of LLM-powered applications in production environments.

Model Generated Features

Model-generated features are a powerful technique for creating high-quality context, boosting the performance of predictive or generative AI systems. Tecton provides a seamless and efficient way to use custom models for context generation. A few examples of model-generated features are:

Custom embeddings: Transforming product descriptions and categories into dense vector representations, enabling more accurate recommendation systems.
Text classification: Performing sentiment analysis on user posts.
Image analysis: Extracting signals such as product color from images.
Named entity recognition: Identifying and categorizing named entities (e.g., person names, organizations, locations) in unstructured text data.

Realtime Compute and Serving Isolation

Tecton 1.0 introduces first-class objects for provisioning realtime infrastructure: Feature Server Groups and Transform Server Groups.

Feature Server Groups are a cluster of live serving nodes capable of autoscaling that return feature vectors via Tecton’s HTTP API. Transform Server Groups are a cluster of live compute nodes that calculate Realtime Feature View values.

Feature Server Groups and Transform Server Groups are “isolated”, meaning that they only serve or compute features within a pre-defined scope such as a workspace.

Isolation provides the following benefits:

Eliminates cross use-case disruption caused by shared serving and compute infrastructure, preventing resource contention between one team's test cases and another's production traffic. Isolation ensures that each use-case’s operations remain independent and performant.
Facilitates granular resource and cost management through provisioning controls.
Provides the ability to do better cost attribution of online serving and compute costs
Improves the security posture by restricting the network flow between different usecases

Updated Feature View Definitions

In 1.0, Feature Views accept a features parameter instead of schema or aggregations. The features parameter will also support Embeddings and Inferences.

# Before
@batch_feature_view(
    # ...
    schema=[
        Field("user_id", String),
        Field("value", Int64),
        Field("timestamp", Timestamp),
    ],
)
def feature_view(input):
    return f"""
        SELECT user_id, value, timestamp FROM {input}
    """


# After
@batch_feature_view(
    # ...
    features=[
        Attribute(
            name="value",
            dtype=Int64,
            description="A short blurb about my feature.",
            tags={"release": "development"},
        ),
    ],
    timestamp_field="timestamp",
)
def feature_view(input):
    return f"""
        SELECT user_id, value, timestamp FROM {input}
    """

Features (Aggregate, Attribute, Embedding, Inference) now accept descriptions and tags which improves organization and facilitates discovery of Tecton features within large workspaces. Further, workspaces now expose properties to easily fetch Feature Views, Entities, Data Sources, and other FCOs.

FilteredSource by Default

Past versions of Tecton have offered FilteredSources as a way to pre-filter datasources within a materialization job. This utility has improved performance, reduced costs, and proven to be the best option for most use cases. In 1.0, we've changed the default Feature View behavior to filter by default.

We have also adopted builder pattern for source filtering options. Instead of importing FilteredSource, you can now call .unfiltered() or .select_range(start_time, end_time) directly on a DataSource object.

# Before
@batch_feature_view(
    sources=[
        users,
        FilteredSource(transactions),
    ]
)
def bfv():
    pass


# After
@batch_feature_view(
    sources=[
        users.unfiltered(),
        transactions,  # Filtered by default
    ]
)
def bfv():
    pass

Plan Integration Tests

Plan integration tests that are initiated and ran asynchronously during plan/apply with the --integration-test flag. They will attempt to materialize data to fully test the materialization pipeline without actually writing any data to any store. These integrations tests run independently of the Tecton validation cluster.

Remote Dataset Generation

You can now generate evaluation datasets for model training or fine-tuning from any environment. All you need is the tecton Python package. You can start remote Dataset jobs using start_dataset_job , and the output will be easily accessible via a Tecton Dataset.

Realtime Feature View Improvements

On Demand Feature Views have been renamed to Realtime Feature Views and augmented with RealtimeContext objects. RealtimeContext allows users to access a request_timestamp, consistent in both online and offline query paths. Further, the Realtime Feature View architecture is now more performant and reliable.

# Before
@on_demand_feature_view(
    mode="pandas",
    sources=[transaction_request],
    schema=[Field("amount", Int64)],
)
def my_feature_view(request_df: pandas.DataFrame):
    pass


# After
from tecton_core.realtime_context import RealtimeContext


@realtime_feature_view(
    mode="pandas",
    sources=[transaction_request],
    features=[Attribute("amount", Int64)],
)
def my_feature_view(request_df: pandas.DataFrame, context: RealtimeContext):
    pass

Compaction Updates

Compaction for Stream Feature Views using Tecton Aggregations now supports more aggregation window types. In 1.0, users can now use time window aggregations in addition to lifetime aggregations. Compaction significantly improves the performance of online retrieval and lowers costs of materialization. See our docs for information on Compaction.

In addition, 1.0 contains offline query performance improvements for Stream Feature Views using lifetime aggregations.

Timestamp Features

We've added support for a long overlooked data type in Timestamps. Formerly, Tecton supported these values only for the 'record' timestamp. In 1.0, customers may define feature values as well.

Offline queries for Rift in Pandas mode return datetime64[us, UTC] types, aligning the granularity and timezone behavior with the rest of the system. Users can opt out of this behavior by setting the TECTON_STRIP_TIMEZONE_FROM_FEATURE_VALUES value to True.

MaterializationContext#start_time and end_time are now datetime.datetime instead of pendulum.DateTime.

Aggregation Leading Edge Parameter

Tecton 1.0 introduces a new optional parameter for Stream Feature Views: aggregation_leading_edge. With this parameter, users can set the aggregation strategy for processing stream events:

aggregation_leading_edge=AggregationLeadingEdge.LATEST_EVENT_TIME: The aggregation window's leading edge is the timestamp of the last event that has been processed.
aggregation_leading_edge=AggregationLeadingEdge.WALL_CLOCK_TIME: The aggregation window's leading edge is the current request timestamp. This parameter reduces the read costs by almost half.

Caching for Feature Tables

Feature Tables now support caching. Caching in Feature Tables is enabled via the cache_config parameter, which works the same as for caching of Feature Views. See the docs for more info.

Updated column order of get_features_in_range

_valid_from now appears before _valid_to in the output of get_features_in_range.

Before:

get_features_in_range.to_pandas() returns columns in the order [join_key, feature_1, feature_2, ..., _valid_to, _valid_from .

After:

get_features_in_range.to_pandas() returns columns in the order [join_key, feature_1, feature_2, ..., _valid_from, _valid_to .

Parameter Renames and Removals

In tecton.login(), the url parameter has been renamed to tecton_url
MaterializationContext#feature_start_time and MaterializationContext#feature_end_time replaced by MaterializationContext#start_time and MaterializationContext#end_time respectively
Datasource#columns removed
Wokrkspace#get_all removed

Embeddings​

Prompt Lifecycle Management​

Model Generated Features​

Realtime Compute and Serving Isolation​

Updated Feature View Definitions​

FilteredSource by Default​

Plan Integration Tests​

Remote Dataset Generation​

Realtime Feature View Improvements​

Compaction Updates​

Timestamp Features​

Aggregation Leading Edge Parameter​

Caching for Feature Tables​

Updated column order of get_features_in_range​

Parameter Renames and Removals​