0.3 to 0.4 Upgrade Guide
What's new in 0.4
Tecton 0.4 includes the next generation feature definition framework with native support for Snowflake transformations. The major goal for this release is to simplify core concepts, while increasing flexibility and maintaining high performance.
Tecton 0.4 includes many updates and improvements. For a full list see the Tecton 0.4 Release Notes.
0.3 and 0.4 side-by-side comparison
Tecton version 0.4 introduces new classes which replace classes that were available in version 0.3. The following tables list the mapping between 0.3 and 0.4 classes, parameters, and methods.
Class Renames/Changes
0.3 Definition |
0.4 Definition |
Data Sources |
|
BatchDataSource |
BatchSource |
StreamDataSource |
StreamSource |
FileDSConfig |
FileConfig |
HiveDSConfig |
HiveConfig |
KafkaDSConfig |
KafkaConfig |
KinesisDSConfig |
KinesisConfig |
RedshiftDSConfig |
RedshiftConfig |
RequestDataSource |
RequestSource |
SnowflakeDSConfig |
SnowflakeConfig |
Feature Views |
|
@batch_window_aggregate_feature_view |
@batch_feature_view |
@stream_window_aggregate_feature_view |
@stream_feature_view |
Misc Classes |
|
FeatureAggregation |
Aggregation |
New Classes |
|
- |
AggregationMode |
- |
KafkaOutputStream |
- |
KinesisOutputStream |
- |
FilteredSource |
Deprecated Classes in 0.3 |
|
Input |
- |
BackfillConfig |
- |
MonitoringConfig |
- |
Feature View/Table Parameter Changes
0.3 Definition |
0.4 Definition |
inputs |
sources |
name_override |
name |
aggregation_slide_period |
aggregation_interval |
timestamp_key |
timestamp_field |
batch_cluster_config |
batch_compute |
stream_cluster_config |
stream_compute |
online_config |
online_store |
offline_config |
offline_store |
output_schema |
schema |
family |
- (removed) |
schedule_offset |
- (removed, see DataSource data_delay) |
monitoring.alert_email (nested) |
alert_email |
monitoring.monitor_freshness (nested) |
monitor_freshness |
monitoring.expected_freshness (nested) |
expected_freshness |
Data Source Parameter Changes
0.3 Definition |
0.4 Definition |
timestamp_column_name |
timestamp_field |
batch_ds_config |
batch_config |
stream_ds_config |
stream_config |
raw_batch_translator |
post_processor |
default_watermark_delay_threshold |
watermark_delay_threshold |
default_initial_stream_position |
initial_stream_position |
Interactive Method Changes
In addition to declarative classes, interactive FeatureView and FeatureTable methods with overlapping functionality have been consolidated.
0.3 Interactive Method |
0.4 Interactive Method |
get_historical_features |
get_historical_features |
preview |
get_historical_features |
get_feature_dataframe |
get_historical_features |
get_features |
get_historical_features |
get_online_features |
get_online_features |
get_feature_vector |
get_online_features |
run |
run* |
*arguments have changed in 0.4
Incremental upgrades from 0.3 to 0.4
Feature repository updates and CLI updates can be decoupled using the compat
library for 0.3 feature definitions. Feature definitions using the 0.3 objects can continue to be used when imports are migrated to the compat
module. 0.3 objects cannot be applied using tecton~=0.4
without changing from tecton
import paths to from tecton.compat
. For example:
# 0.3 object imports can only be applied using tecton 0.3
from tecton import Entity
# 0.3 object imports from `compat` can be applied using tecton 0.4
from tecton.compat import Entity
See the table below for a compatibility matrix.
|
CLI 0.3 |
CLI 0.4 |
Framework 0.3 |
from tecton import (as normal) |
from tecton.compat import required |
Framework 0.4 |
Not supported |
from tecton import (as normal) |
⚠️ Important: Upgrades for materialized Feature Views
Migration of a 0.3 Feature View to a 0.4 Feature View, where the 0.3 Feature View has materialized data, will cause the 0.4 Feature View to re-materialize the feature data.
Support for no-materialization upgrades is expected to be released later this year. We recommend avoiding upgrades for Feature Views that are costly to re-materialize until this functionality is available.
Steps to upgrade to Tecton 0.4 CLI with a 0.3 feature repo in compatibility mode
The steps below will allow you to use a Tecton 0.4 CLI with a 0.3 feature repo using imports from tecton.compat
. In this state, you'll be able to upgrade your CLI without upgrading your feature repo definitions.
1. Verify the repo is 0.3 compatible
pip install tecton~=0.3.0
and run tecton plan
. This should yield no diffs before beginning the migration.
2. Migrate imports to the compat
library
Update all imports from the tecton
module that are found in the 0.3.0 column of the table above to import from tecton.compat
instead. This is important! You should only update 0.3.0 classes to import from tecton.compat
. Do not run tecton apply
yet!
3. Upgrade your CLI to 0.4
Install tecton~=0.4.0
and run tecton plan
. This should yield no diffs.
4. Congrats! You now have a 0.3 repo compatible with a 0.4 CLI.
In this setup, you should be able to continue using your 0.3 repo with tecton
0.4.
Steps to upgrade feature repo from 0.3 to 0.4
Once you've upgraded to the Tecton 0.4 CLI and updated feature repo to use 0.3 compatibility mode, you can begin updating your feature repo to use 0.4 Tecton object definitions.
The fundamentals of 0.3 and 0.4 are similar except for class and parameter names. Most migrations will require light refactoring of each Tecton object definition.
Important: With the exception of Feature Services, 0.3 objects can only depend on 0.3 objects, and 0.4 objects can only depend on 0.4 objects. Feature repos can be upgraded incrementally, but Feature Views must be upgraded in lock-step with any upstream dependencies (Transformations, Entities, Data Sources).
0.3 FS <-> 0.4 FV
0.4 FV → 0.3 DS, 0.3 Ent
Example 1: Converting a 0.3 BatchDataSource to a 0.4 BatchSource
# 0.3
from tecton.compat import HiveDSConfig
from tecton.compat import BatchDataSource
hive_config = HiveDSConfig(
table="transaction_log",
database="accounting"
)
batch03 = BatchDataSource(
name="tx_log",
batch_ds_config=hive_config
)
|
# 0.4
from tecton import HiveConfig
from tecton import BatchSource
hive_config = HiveConfig(
table="transaction_log",
database="accounting"
)
batch04 = BatchSource(
name="tx_log",
batch_config=hive_config
)
|
Example 2: Converting a 0.3 StreamDataSource to a 0.4 StreamSource
# 0.3
from tecton.compat import HiveDSConfig
from tecton.compat import KinesisDSConfig
from tecton.compat import StreamDataSource
hive_config = HiveDSConfig(
table="transaction_log",
database="accounting"
)
def noop_translator(df):
return df
kinesis_config = KinesisDSConfig(
stream_name="transaction_stream",
raw_stream_translator=noop_translator,
region="us-west-2"
)
stream03 = StreamDataSource(
name="transactions_03",
batch_ds_config=hive_config,
stream_ds_config=stream_config
)
|
# 0.4
from tecton import HiveConfig
from tecton import KinesisConfig
from tecton import StreamSource
hive_config = HiveConfig(
table="transaction_log",
database="accounting"
)
def noop_translator(df):
return df
kinesis_config = KinesisConfig(
stream_name="transaction_stream",
post_processor=noop_translator,
region="us-west-2"
)
stream04 = BatchSource(
name="transactions_04",
batch_config=hive_config,
stream_config=kinesis_config
)
|
Example 3: Converting a 0.3 Stream Window Aggregate Feature View to a 0.4 Stream Feature View
# 0.3
from tecton.compat import stream_window_aggregate_feature_view
from tecton.compat import FeatureAggregation
from tecton.compat import Input
from datetime import datetime
@stream_window_aggregate_feature_view(
inputs={'ad_impressions': Input(ad_impressions_stream)},
entities=[user, ad],
mode='spark_sql',
aggregation_slide_period='1h',
aggregations=[
FeatureAggregation(
column='impression',
function='count',
time_windows=['1h', '24h','72h']
)
],
online=False,
offline=False,
batch_schedule='1d',
feature_start_time=datetime(2022, 5, 1),
)
def user_ad_impression_counts(ad_impressions):
return f"""
SELECT
user_uuid as user_id,
ad_id,
1 as impression,
timestamp
FROM
{ad_impressions}
"""
|
# 0.4
from tecton import stream_feature_view
from tecton import Aggregation
from tecton import FilteredSource
from datetime import datetime, timedelta
@stream_feature_view(
source=FilteredSource(ad_impressions_stream),
entities=[user],
mode='spark_sql',
aggregation_interval=timedelta(hours=1),
aggregations=[
Aggregation(column='impression', function='count', time_window=timedelta(hours=1)),
Aggregation(column='impression', function='count', time_window=timedelta(hours=24)),
Aggregation(column='impression', function='count', time_window=timedelta(hours=72)),
],
online=False,
offline=False,
batch_schedule=timedelta(days=1),
feature_start_time=datetime(2022, 5, 1),
)
def user_impression_counts(ad_impressions):
return f'''
SELECT
user_uuid as user_id,
1 as impression,
timestamp
FROM
{ad_impressions}
'''
|
Example 4: Converting a 0.3 Batch Feature View (non-aggregate) to a 0.4 Batch Feature View
# 0.3
from tecton.compat import batch_feature_view
from tecton.compat import Input
from tecton.compat import tecton_sliding_window
from tecton.compat import transformation
from tecton.compat import const
from tecton.compat import BackfillConfig
from datetime import datetime
# Counts distinct ad ids for each user and window. The timestamp
# for the feature is the end of the window, which is set by using
# the tecton_sliding_window transformation
@transformation(mode='spark_sql')
def user_distinct_ad_count_transformation(window_input_df):
return f'''
SELECT
user_uuid as user_id,
approx_count_distinct(ad_id) as distinct_ad_count,
window_end as timestamp
FROM
{window_input_df}
GROUP BY
user_uuid, window_end
'''
@batch_feature_view(
inputs={'ad_impressions': Input(ad_impressions_batch, window='7d')},
entities=[user],
mode='pipeline',
ttl='1d',
batch_schedule='1d',
online=False,
offline=False,
feature_start_time=datetime(2022, 5, 1),
backfill_config=BackfillConfig("multiple_batch_schedule_intervals_per_job"),
)
def user_distinct_ad_count_7d(ad_impressions):
return user_distinct_ad_count_transformation(
# Use tecton_sliding_transformation to create trailing 7 day time windows.
# The slide_interval defaults to the batch_schedule (1 day).
tecton_sliding_window(ad_impressions,
timestamp_key=const('timestamp'),
window_size=const('7d')))
|
# 0.4
from tecton import batch_feature_view
from tecton import FilteredSource
from tecton import materialization_context
from datetime import datetime, timedelta
@batch_feature_view(
sources=[FilteredSource(ad_impressions_batch, start_offset=timedelta(days=-6))],
entities=[user],
mode='spark_sql',
ttl=timedelta(days=4),
batch_schedule=timedelta(days=1),
incremental_backfills=True,
online=False,
offline=True,
feature_start_time=datetime(2022, 5, 1)
)
def user_distinct_ad_count_7d(ad_impressions, context=materialization_context()):
return f'''
SELECT
user_uuid as user_id,
approx_count_distinct(ad_id) as distinct_ad_count,
TO_TIMESTAMP("{context.end_time}") - INTERVAL 1 MICROSECOND as timestamp
FROM
{ad_impressions}
GROUP BY
user_uuid
'''
|
Example 5: Converting a 0.3 Feature Table to a 0.4 Feature Table
# 0.3
from pyspark.sql.types import (
StructType,
StructField,
FloatType,
ArrayType,
StringType,
TimestampType
)
from tecton.compat import Entity
from tecton.compat import FeatureTable
from tecton.compat import DeltaConfig
schema = StructType([
StructField('user_id', StringType()),
StructField('timestamp', TimestampType()),
StructField('user_embedding', ArrayType(FloatType())),
])
user_embeddings = FeatureTable(
name='user_embeddings',
entities=[user],
schema=schema,
online=True,
offline=True,
ttl='10day',
description='Precomputed user embeddings pushed into Tecton.'
)
|
# 0.4
from tecton.types import Field, String, Timestamp, Array, Float64
from tecton import Entity, FeatureTable, DeltaConfig
from datetime import timedelta
schema = [
Field('user_id', String),
Field('timestamp', Timestamp),
Field('user_embedding', Array(Float64))
]
user_embeddings = FeatureTable(
name='user_embeddings',
entities=[user],
schema=schema,
online=True,
offline=True,
ttl=timedelta(days=10),
description='Precomputed user embeddings pushed into Tecton.'
)
|
Example 6: Converting a 0.3 On-Demand Feature View to a 0.4 On-Demand Feature View
# 0.3
from tecton.compat import RequestDataSource
from tecton.compat import Input
from tecton.compat import on_demand_feature_view
from pyspark.sql.types import StructType, StructField, FloatType, ArrayType, DoubleType
from ads.features.feature_tables.user_embeddings import user_embeddings
request_schema = StructType([StructField('query_embedding', ArrayType(FloatType()))])
request = RequestDataSource(request_schema=request_schema)
output_schema = StructType([StructField('cosine_similarity', DoubleType())])
@on_demand_feature_view(
inputs={
'request': Input(request),
'user_embedding': Input(user_embeddings)
},
mode='python',
output_schema=output_schema
)
def user_query_embedding_similarity(request, user_embedding):
import numpy as np
from numpy.linalg import norm
@np.vectorize
def cosine_similarity(a: np.ndarray, b: np.ndarray):
# Handle the case where there is no precomputed user embedding.
if a is None or b is None:
return -1.0
return np.dot(a, b)/(norm(a)*norm(b))
result = {}
result["cosine_similarity"] = cosine_similarity(
user_embedding["user_embedding"], request["query_embedding"]
).astype('float64')
return result
|
# 0.4
from tecton import RequestSource
from tecton import on_demand_feature_view
from tecton.types import Field, Array, Float64
from ads.features.feature_tables.user_embeddings import user_embeddings
request_schema = [Field('query_embedding', Array(Float64))]
request = RequestSource(schema=request_schema)
output_schema = [Field('cosine_similarity', Float64)]
@on_demand_feature_view(
sources=[request, user_embeddings],
mode='python',
schema=output_schema
)
def user_query_embedding_similarity(request, user_embedding):
import numpy as np
from numpy.linalg import norm
@np.vectorize
def cosine_similarity(a: np.ndarray, b: np.ndarray):
# Handle the case where there is no precomputed user embedding.
if a is None or b is None:
return -1.0
return np.dot(a, b)/(norm(a)*norm(b))
result = {}
result["cosine_similarity"] = cosine_similarity
user_embedding["user_embedding"], request["query_embedding"]
).astype('float64')
return result
|
FAQs
How do Batch/Stream Window Aggregate Feature Views in 0.3 map to 0.4 Feature Views?
A 0.4 batch_feature_view
that has the aggregations
and aggregation_interval
parameters set will behave the same as a 0.3 batch_window_aggregate_feature_view
(the same is true for stream_feature_view
). See the Time-Windowed Aggregations Guide for more info
When should I use incremental backfills?
When Tecton's built-in aggregations aren't an option, using incremental_backfills=True
will instruct Tecton to execute your query every batch_schedule
with each job being responsible for a single time-window aggregation. See the Incremental Backfill guide for more information.
When should I use FilteredSource
?
FilteredSource
should be used whenever possible for Spark users (Databricks or EMR) to push down time filtering to data sources. This will make incremental materialization jobs much more efficient since costly scans across long time ranges can be avoided.
How long will Tecton 0.3 be supported for?
See Release Notes for more details.