Skip to main content
Version: 0.9

Test Stream Features

Import libraries and select your workspace

import tecton
import pandas
from datetime import datetime

ws = tecton.get_workspace("prod")

Load a Stream Feature View

fv = ws.get_feature_view("last_transaction_amount_sql")

Start a Streaming Job to view real-time streaming features


This section only applies to Spark streaming features. These methods must be run on a Spark cluster.

The run_stream method will start a Spark Structured Streaming job and write the results to the specified temporary table.


The temporary table can then be queried to view real-time results. Run this code in a separate notebook cell.

# Query the result from the streaming output table.
display(spark.sql("SELECT * FROM output_temp_table ORDER BY timestamp DESC LIMIT 5"))
0user_4699984415712022-06-07 18:31:2454.46
1user_4608779617872022-06-07 18:31:2173.02
2user_6503879770762022-06-07 18:31:2046.05
3user_6996681258182022-06-07 18:31:1759.24
4user_3944957590232022-06-07 18:31:1511.38

Get a Range of Feature Values from Offline Feature Store

from_source=True can be passed in in order to bypass the offline store and compute features on-the-fly against the raw data source. This is useful for testing the expected output of feature values.

Use from_source=False (default) to see what data is materialized in the offline store.

result_dataframe = fv.get_features_in_range(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2), from_source=True
0user_176.452022-05-01 01:50:512022-05-02 00:00:00
1user_245.82022-05-01 02:05:392022-05-02 03:51:28
2user_21.432022-05-01 03:51:282022-05-02 00:00:00
3user_352.312022-05-01 02:41:422022-05-02 00:00:00
4user_464.152022-05-01 04:48:272022-05-02 00:00:00

Read the Latest Features from Online Feature Store

fv.get_online_features({"user_id": "user_3"}).to_dict()
Out: {"amt": 180.6}

Read Historical Features from Offline Feature Store with Time-Travel

Create a events DataFrame with events to look up. For more information on the events dataframe, check out Selecting Sample Keys and Timestamps.

events = pandas.DataFrame(
"user_id": ["user_3", "user_5"],
"timestamp": [datetime(2022, 5, 1, 19), datetime(2022, 5, 6, 10)],
0user_32022-05-01 19:00:00
1user_52022-05-06 10:00:00

from_source=True can be passed in in order to bypass the offline store and compute features on-the-fly against the raw data source. However, this will be slower than reading feature data that has been materialized to the offline store.

features_df = fv.get_features_for_events(events, from_source=True).to_pandas()
0user_32022-05-01 19:00:0052.31
1user_52022-05-06 10:00:0058.68

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon