0.6.0
New Features
Notebook-Driven Development
Notebook-driven development increases the speed of iteration on Tecton feature pipelines by enabling you to author and test feature pipelines directly in your notebook. With notebook-driven development, you can:
- Define any Tecton Object, such as Entities, Data Sources, Feature Views, and Feature Services in a notebook
- Immediately validate and interact with Tecton Objects, such as previewing data
- Create training datasets by combining features already created in a Workspace with new features directly authored in your notebook
To get started with notebook-driven development, see the Tecton Development Workflow.
First-n, First-n Distinct, and Last-n Aggregation Functions
You can now use Tecton’s Aggregation Engine to do first(n)
,
first_distinct(n)
, and last(n)
aggregations, in addition to the existing
last_distinct(n)
aggregation.
This family of aggregations are especially powerful when combined with
On-Demand Feature Views.
For example, to create a feature that captures if a user is making repeated
transactions, you could use the last(n)
function on their prior transaction
amounts and compare it to the current transaction value.
Note that as part of this change, the default feature names for the existing
last_distinct(n)
aggregations have changed. Please see the upgrade guide for
details.
Faster data ingestion for Stream Feature Views
Tecton’s Continuous Processing Mode is now available for all Stream Feature Views, where previously the option was only available when using built-in aggregations.
By using Continuous Processing Mode for Stream Feature Views without aggregations, typical feature ingestion time improves from 1 minute to single-digit seconds.
To get started with Continuous Processing Mode, see Stream Processing Mode.
Tecton Access Control and Service Account CLI Commands
The new tecton access-control
and tecton service-account
commands provide
new options for managing Tecton Access Controls through the CLI. You can now
view, assign, and remove roles directly from your terminal.
For example, you can use the new commands to create a new Service Account and
grant it the ability to request features from our prod
workspace.
tecton service-account create \
--name "sample-service-account" \
--description "An example for the release notes"
Save this API Key - you will not be able to get it again.
API Key: <Your-api-key>
Service Account ID: <Your-Service-Account-ID>
tecton access-control assign-role --role consumer \
--workspace <Your-workspace> \
--service-account <Your-Service-Account-ID>
Successfully updated role.
tecton access-control get-roles \
--service-account <Your-Service-Account-ID>
Workspace Role
================================
<Your-workspace> consumer
To get started, see the command details with tecton access-control --help
and
tecton service-account --help
.
Query Debugging Tools
Tecton 0.6 brings new explainability and debugging capabilities to the feature development process. For any interactive query that produces a Tecton DataFrame, you can print a query tree using .explain() and step through it to inspect data and diagnose slow queries or queries that return unexpected data.
For more information check out Debugging Queries.
feature_service.get_historical_features(training_events).explain()
Stream Ingest API (Private Preview)
Tecton’s new Stream Ingest API in 0.6 makes publishing real-time data to the Feature Store from any stream or micro-service easy - you can do it via a simple HTTP API call! Tecton makes ingested data available both for online serving and for offline training data generation. Tecton’s Stream Ingest API is fully compatible with Tecton’s aggregations framework - this means that Tecton can even calculate aggregations on top of ingested real-time data. For example, a microservice could ingest raw transactions into Tecton using the Stream Ingest API. An ML application could afterward retrieve the 1-minute aggregate transaction count for a given credit card from Tecton.
Contact us to learn more or participate in the Private Preview.
Changes, enhancements and resolved issues
New DBR and EMR Supported Versions
Tecton 0.6 extends support for new Databricks Runtime and EMR versions. The lists below show supported versions and the defaults for Tecton 0.5 and 0.6.
Supported Databricks Runtimes:
- 9.1.x-scala2.12 (Tecton 0.5 default)
- 10.4.x-scala2.12 (Tecton 0.6 default)
- 11.3.x-scala2.12
Supported EMR Versions:
- emr-6.5.0 (Tecton 0.5 default)
- emr-6.7.0 (Tecton 0.6 default)
- emr-6.9.0
Unit Testing Interface Improvements
Tecton has made a few minor changes to methods used for running unit tests:
FeatureView.run()
has been renamed toFeatureView.test_run()
. This new name helps differentiate between the method for unit testing and the method for interactive execution in notebook environments.start_time
andend_time
are now required parameters forBatch/StreamFeatureView
run()
andtest_run()
. The former default behaviors for start/end time led to a lot of customer confusion.FeatureView.test_run()
does not have aspark
parameter for specifying the Spark session. By default,FeatureView.test_run()
will use the Tecton-defined Spark session. You can override the Spark session withtecton.set_tecton_spark_session()
.- Some internal changes were made to ensure the unit testing code path appropriately reflects the production code path. It’s possible some minor changes in behavior will cause tests to fail.
See the Unit Testing guide for more details on how to write unit tests with Tecton 0.6.
prevent_destroy
parameter
Previously, you could set a tag with the prevent_destroy
key to help mitigate
the risk that erroneous changes impact production feature pipelines. This
functionality is now a top level parameter for Feature Views and Feature
Services to make the option more discoverable.
tecton.get_current_workspace()
When defining Tecton Objects, it can be helpful to configure conditional logic based on the Workspace applied to. For example, you may want to use On Demand instances for materialization jobs in your production workspaces to improve job reliability, and Spot instances in your staging environment to reduce costs.
The get_current_workspace()
method provides a convenient way to implement this
conditional logic.
# use prod warehouse only in the prod environment
warehouse = "prod" if get_current_workspace() == "prod" else "dev"
datasource = BatchDataSource(
name="mytable", batch_config=SnowflakeConfig(warehouse=warehouse, table="mytable", timestamp_field="timestamp")
)
# save costs by materializing farther back only in the prod environment
start_time = datetime(2020, 1, 1) if get_current_workspace() == "prod" else datetime(2023, 1, 1)
@batch_feature_view(
sources=[FilteredSource(datasource)],
entities=[customer],
mode="spark_sql",
online=True,
feature_start_time=start_time,
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def my_fv(source):
...
tecton.who_am_i()
and tecton.set_credentials()
The new tecton.who_am_i()
method provides a convenient way to inspect what
Tecton credentials you’re using in a Notebook environment. Equivalently, you can
use the tecton whoami
command in a CLI environment.
The tecton.set_credentials()
method for setting the session-level credentials
in a Notebook has a new tecton_url
argument. This argument can be helpful if
you have multiple Tecton instances in your organization.
Finally, tecton.test_credentials()
is a convenience method to assert that you
have valid credentials and is useful in a notebook environment.
Sunsetting Python 3.7 support
Starting in 0.6, the Tecton SDK and CLI no longer run in Python 3.7 environments. The Tecton SDK and CLI retain compatibility with Python 3.8 and Python 3.9.
The Tecton CLI is also compatible with Python 3.10 and Python 3.11. While the Tecton SDK is likely to work on Python 3.10 and Python 3.11 as well, it has not been tested.
Upgrading to 0.6
Follow this upgrade guide to upgrade to 0.6. The guide outlines all breaking and non-breaking changes.