0.5.0

October 3, 2022

New features

Materialization jobs can be manually triggered

With the Materialization API, you can manually trigger materialization via an API call. The Materialization API can be used in the the Tecton SDK and in Airflow, through the Tecton Airflow provider.

Feature Output Streams

Feature View Output Streams enable your application to subscribe to the outputs of streaming feature pipelines. Your application accesses these outputs via a stream sink. Feature View Output Streams are designed to be used for asynchronous predictions, where model inference is triggered by newly arriving feature data.

The Tecton SDK can be used, in any Python Environment, to retrieve features

Using the Tecton SDK with AWS Athena removes the requirement that you use a Databricks notebook or an AWS EMR notebook to retrieve features from Tecton’s offline store.

When using the Tecton SDK with AWS Athena, you can retrieve features from Tecton’s offline store in any Python environment that has access to AWS (e.g. your local laptop, a Jupyter notebook, Kubeflow pipelines etc).

Data Source Functions, for increased flexibility in working with Data Sources

When defining a BatchSource or StreamSource object, you set the batch_config or stream_config parameter, respectively. The value of these configs can be the name of an object (such as HiveConfig or KafkaConfig) or a Data Source Function.

Compared to using an object, a Data Source Function gives you more flexibility in connecting to an underlying data source and specifying logic for transforming the data retrieved from the underlying data source. However, using an object is recommended if you do not require the additional flexibility offered by a Data Source Function.

Rematerialization can be suppressed, to reduce infrastructure costs

After refactoring a Python function or migrating an upstream Data Source, you can run tecton plan or tecton apply with the --suppress-recreates flag to suppress rematerialization. When rematerialization is suppressed, feature values are not recalculated.

You should only use the --suppress-recreates flag when you are confident that changes to a Tecton repo will not affect feature values.

Struct Type Features in On-Demand Feature Views

You can include a Struct data type in the output schema of an On-Demand Feature View (ODFV). A Struct can contain multiple fields with mixed data types.

A Struct can be nested within other complex types. For example, you can have a Struct within a Struct, or an array of Structs.

Using a Struct in the output schema of an ODFV allows you to easily parse the ODFV's output when it contains multiple feature values.

Improvements and bug fixes

`to_dict` support on SDK methods returning tabular `Displayable` objects

All SDK methods returning a table now return a Displayable object with a to_dict() method. The following methods have been updated.

materialization_status()
summary()
deletion_status()
get_feature_freshness() (see Note below)

note

get_feature_freshness no longer supports the to_dict parameter. Calls to the method can be updated by changing tecton.get_feature_freshness(to_dict=True) to tecton.get_feature_freshness().to_dict().

Alert email must now be set if `monitor_freshness` = `True`

For monitoring of feature views, the alert_email parameter must also be set if monitor_freshness = True. This is to ensure that alerting emails are sent for the desired feature views. See Alerts for more information.

get_historical_features() performance improvements on Spark

get_historical_features() has been updated with a more performant point-in-time join. This join results in faster feature value retrieval when both of the following are true:

The call to get_historical_features() contains a spine.
get_historical_features() returns feature values from non-aggregate Feature Views, custom aggregate Feature Views, or Feature Services that contain the prior two Feature Views mentioned.

Batch Feature View skew reduction

To reduce online/offline skew, get_historical_features() now uses the _effective_timestamp (calculated internally) to retrieve feature values. The _effective_timestamp is the earliest time the feature will be available in the online store for inference. The _effective_timestamp column is automatically added to all feature records returned by calls to get_historical_features() which do not include a spine.

Improved support for nulls in On-Demand Feature Views

On-Demand Feature Views now have improved support for nulls. On-Demand Feature Views that use Pandas still have some null special handling; see the documentation.

Upgrading to 0.5

0.5 will no longer support compat definitions. Follow the instructions below to upgrade to 0.5 based on your current version. You will NOT need to re-materialize data to upgrade your objects.

note

In 0.5, you must set an alert email for Feature Views with monitoring enabled. You may see this error blocking your apply. When upgrading from 0.3 or 0.4 in compatibility mode, please configure the alert email while upgrading your Feature views. No other semantic changes can be done when upgrading.

When upgrading to 0.5 you will see updates to your Feature View's batch_trigger like the following as a result of the new Materialization API. These changes have no effect, and will only occur the first time you run tecton apply with Tecton 0.5

  ~ Update BatchDataSource
    name:            transactions_batch
    description:     Batch Data Source for transactions stream
    batch_trigger:   BATCH_TRIGGER_TYPE_UNKNOWN -> BATCH_TRIGGER_TYPE_SCHEDULED

From 0.4 non-compat:

You can move to 0.5 CLI without making any changes!

From 0.4 in compatibility mode (tecton.compat):

You can move to 0.5 CLI directly if you upgrade all of your definitions to 0.4 definitions using this upgrade guide in one tecton apply.
To upgrade definitions incrementally, i.e. in multiple tecton apply steps:

1.) Upgrade objects to 0.4 definitions using 0.4 CLI with this guide.

2.) Once all your objects are in 0.4 definitions you can move to 0.5 CLI.

From 0.3:

You can move to 0.5 CLI directly if you upgrade all of your definitions to 0.4 definitions using this upgrade guide in one tecton apply.
To upgrade incrementally, i.e. in multiple tecton apply steps:

1.) You must first upgrade to 0.4 CLI with objects in compatibility mode. Follow these instructions.

2.) Upgrade your objects from 0.4 compat to 0.4 definitions using these instructions.

3.) Once all your objects are 0.4 definitions, you can move to 0.5 CLI.

New features​

Materialization jobs can be manually triggered​

Feature Output Streams​

The Tecton SDK can be used, in any Python Environment, to retrieve features​

Data Source Functions, for increased flexibility in working with Data Sources​

Rematerialization can be suppressed, to reduce infrastructure costs​

Struct Type Features in On-Demand Feature Views​

Improvements and bug fixes​

to_dict support on SDK methods returning tabular Displayable objects​

Alert email must now be set if monitor_freshness = True​

get_historical_features() performance improvements on Spark​

Batch Feature View skew reduction​

Improved support for nulls in On-Demand Feature Views​

Upgrading to 0.5​