Skip to main content
Version: 0.8

Test Data Sources

Data Sources can be tested in your notebook environment. Use the Tecton SDK to get the Tecton workspace where your Data Source is defined.

import tecton

ws = tecton.get_workspace("my_workspace")

Then get the Data Source.

data_source = ws.get_data_source("users_batch")

Verify that Tecton can connect to and read data from the batch source​

Set the start and end times that you will use to filter records from the batch source.

end = datetime.now()
start = end - timedelta(days=30)

Call the get_dataframe method of data_source to get data from the batch source, filtered by start and end:

batch_data_from_tecton = data_source.get_dataframe(start_time=start, end_time=end).to_pandas().head(10)
display(batch_data_from_tecton)

Note that although data_source points to a stream source, data_source.get_dataframe() generates feature values from the batch source.

Verify that Tecton can connect to and read data from stream source​

note

This section is only applicable to Spark stream sources: Kinesis, Kafka, and Spark Data Source Functions.

Call the start_stream_preview method on data_source to write incoming records from the data source to the TEMP_TABLE_TRANSLATED table. Set apply_translator=True to run the post processor function.

note

The following command should only be run for a short period of time. The command will continuously read data from the stream source.

data_source.start_stream_preview(
table_name="TEMP_TABLE_TRANSLATED",
apply_translator=True,
option_overrides={"initialPosition": "earliest"},
)

Query the data in the table and display the output:

spark.sql("SELECT * FROM TEMP_TABLE_TRANSLATED LIMIT 10").show()

If no data is returned after running the previous command, run the command again after a short period of time.

Was this page helpful?