Skip to main content
Version: Beta 🚧

Test Batch Features

Import libraries and select your workspace​

import tecton
import pandas
from datetime import datetime, timedelta

ws = tecton.get_workspace("prod")

Load a Batch Feature View​

fv = ws.get_feature_view("user_transaction_counts")
fv.summary()

Run a Feature View transformation pipeline​

The BatchFeatureView::run_transformation function can be used to dry run execute a Feature View transformation pipeline over a given time range. This can be useful for checking the output of your feature transformation logic or debugging a materialization job.

caution

There is no guarantee that the output data is the same as the feature values that would be created in this time frame, such as in the following cases:

  • When using incremental backfills, feature data for a given time range may depend on multiple executions of the Feature view transformation pipeline.
  • Feature values may be dependent on scheduling information (e.g. batch_schedule, data_delay, feature_start_time) that doesn't match the start_time and end_time you provide.
  • Aggregations may require more input data that the window you provide with start_time and end_time.

If you want to produce feature values for a given time range, you should use get_features_in_range(start_time, end_time).

result_dataframe = fv.run_transformation(start_time=datetime(2021, 1, 1), end_time=datetime(2022, 1, 2)).to_pandas()
display(result_dataframe)
user_idsignup_timestampcredit_card_issuer
0user_6000032784852021-01-01 06:25:57other
1user_4699984415712021-01-01 07:16:06Visa
2user_5025676046892021-01-01 04:39:10Visa
3user_9306919581072021-01-01 10:52:31Visa
4user_7825107887082021-01-01 20:15:25other

Run with mock sources​

Mock input data sources can be passed into the BatchFeatureView::run_transformation function using the same source names from the Feature View definition.

users_data = pandas.DataFrame(
{
"user_id": ["user_1", "user_1", "user_2"],
"cc_num": ["423456789012", "567890123456", "678901234567"],
"signup_timestamp": [
datetime(2022, 1, 1, 2),
datetime(2022, 1, 1, 4),
datetime(2022, 1, 1, 3),
],
}
)

result_dataframe = fv.run_transformation(
start_time=datetime(2022, 1, 1),
end_time=datetime(2022, 1, 2),
mock_inputs={"users": users_data}, # `users` is the name of this Feature View input.
).to_pandas()

display(result_dataframe)
user_idsignup_timestampcredit_card_issuer
0user_12022-01-01 02:00:00Visa
1user_12022-01-01 04:00:00MasterCard
2user_22022-01-01 03:00:00Discover

Run a Batch Feature View with tiled aggregations​

When a feature view with tile aggregates, the query operates in three logical steps:

  1. The feature view query is run over the provided time range. The user defined transformations are applied over the data source.
  2. The result of #1 is aggregated into tiles the size of the aggregation_interval.
  3. The tiles from #2 are combined to form the final feature values. The number of tiles that are combined is based off of the time_window of the aggregation.

To see the output of #1, use run_transformation(). For #2, use get_partial_aggregates(). For #3, get_features_in_range().

agg_fv = ws.get_feature_view("user_transaction_counts")

result_dataframe = agg_fv.run_transformation(
start_time=datetime(2022, 5, 1),
end_time=datetime(2022, 5, 2),
).to_pandas()

display(result_dataframe)
user_idtransactiontimestamp
0user_22250678998412022-05-01 21:04:38
1user_2699081696812022-05-01 19:45:14
2user_33775031741212022-05-01 15:18:48
3user_33775031741212022-05-01 07:11:31
4user_33775031741212022-05-01 01:50:51
result_dataframe = agg_fv.get_partial_aggregates(
start_time=datetime(2022, 5, 1),
end_time=datetime(2022, 5, 2),
).to_pandas()

display(result_dataframe)
user_idtransaction_count_1d_interval_start_time_interval_end_time
0user_22250678998412022-05-01 00:00:002022-05-02 00:00:00
1user_2699081696812022-05-01 00:00:002022-05-02 00:00:00
2user_33775031741242022-05-01 00:00:002022-05-02 00:00:00
3user_40253984590122022-05-01 00:00:002022-05-02 00:00:00
4user_46161596668512022-05-01 00:00:002022-05-02 00:00:00

Get a Range of Feature Values from the Offline Store​

BatchFeatureView::get_features_in_range can read a range of feature values from the offline store between a given start_time and end_time.

from_source=True can be specified to bypass the offline store and compute features on-the-fly against the raw data source. This is useful for testing the expected output of feature values.

Use from_source=False (default) to see what data is materialized in the offline store.

result_dataframe = fv.get_features_in_range(start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2)).to_pandas()
display(result_dataframe)
user_idtimestamptransaction_count_1d_1dtransaction_count_30d_1dtransaction_count_90d_1d_effective_timestamp
0user_2051257466822022-05-01 00:00:0028342022-05-01 00:00:00
1user_2225067899842022-05-01 00:00:001421412022-05-01 00:00:00
2user_2685148449662022-05-01 00:00:00129662022-05-01 00:00:00
3user_3944957590232022-05-01 00:00:00121682022-05-01 00:00:00
4user_4598428899562022-05-01 00:00:00114392022-05-01 00:00:00

Read the Latest Features from Online Feature Store​

danger

For performance reasons, this function should only be used for testing and not in a production environment. To read features online efficiently, see Reading Features for Inference

fv.get_online_features({"user_id": "user_609904782486"}).to_dict()
Out: {
"transaction_count_1d_1d": 1,
"transaction_count_30d_1d": 17,
"transaction_count_90d_1d": 56,
}

Read Historical Features from Offline Feature Store with Time-Travel​

Create an events DataFrame with events to look up. For more information on the events dataframe, check out Selecting Sample Keys and Timestamps.

events = pandas.DataFrame(
{
"user_id": ["user_722584453020", "user_461615966685"],
"timestamp": [datetime(2022, 5, 1, 3, 20, 0), datetime(2022, 6, 6, 2, 30, 0)],
}
)
display(events)
user_idtimestamp
0user_7225844530202022-05-01 03:20:00
1user_4616159666852022-06-06 02:30:00

from_source=True can be specified to bypass the offline store and compute features on-the-fly against the raw data source. However, this will be slower than reading feature data that has been materialized to the offline store.

result_dataframe = fv.get_features_for_events(events, from_source=True).to_pandas()
display(result_dataframe)
user_idtimestampuser_transaction_counts__transaction_count_1d_1duser_transaction_counts__transaction_count_30d_1duser_transaction_counts__transaction_count_90d_1d
0user_4616159666852022-06-06 02:30:0001340
1user_7225844530202022-05-01 03:20:0002873

Was this page helpful?