Skip to main content
Version: 0.9

Test Data Sources

Data Sources can be tested in your notebook environment. Use the Tecton SDK to get the Tecton workspace where your Data Source is defined.

import tecton

ws = tecton.get_workspace("my_workspace")

Then get the Data Source.

data_source = ws.get_data_source("users_batch")

Verify that Tecton can connect to and read data from the batch source

Set the start and end times that you will use to filter records from the batch source.

end = datetime.now()
start = end - timedelta(days=30)

Call the get_dataframe method of data_source to get data from the batch source, filtered by start and end:

batch_data_from_tecton = data_source.get_dataframe(start_time=start, end_time=end).to_pandas().head(10)
display(batch_data_from_tecton)

Note that although data_source points to a stream source, data_source.get_dataframe() generates feature values from the batch source.

Verify that Tecton can connect to and read data from stream source

note

This section is only applicable to Spark stream sources: Kinesis, Kafka, and Spark Data Source Functions.

Call the start_stream_preview method on data_source to write incoming records from the data source to the TEMP_TABLE_TRANSLATED table. Set apply_translator=True to run the post processor function.

note

The following command should only be run for a short period of time. The command will continuously read data from the stream source.

data_source.start_stream_preview(
table_name="TEMP_TABLE_TRANSLATED",
apply_translator=True,
option_overrides={"initialPosition": "earliest"},
)

Query the data in the table and display the output:

spark.sql("SELECT * FROM TEMP_TABLE_TRANSLATED LIMIT 10").show()

If no data is returned after running the previous command, run the command again after a short period of time.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon