Creating and Testing a Batch Data Source
Tecton supports connections to many different data sources. This example uses a Hive table for batch data, but the same principles apply for any raw data source, including streams. See Data Sources overview or the Data Sources API for more details.
You must register a data source with Tecton before you define features based on that data. To register a data source, follow these steps:
- Define a data source object.
- Apply your data source to Tecton using the Tecton CLI.
- Test the data source by querying it in a notebook.
This guide assume you've already set up the permissions required for Tecton to read from the source.
Creating a Batch Data Source
In this example, we define a
BatchSource that contains the configuration
necessary for Tecton to access our Hive user table.
Create a new file in your feature repository, and paste in the following code:
from tecton import HiveConfig, BatchSource
fraud_users_batch = BatchSource(
In the example definition above, we also added metadata parameters for
organization, such as
Applying the Data Source
So far, all we've done is written code in our local feature repository. In order to use the data source in Tecton, we need to apply our new definition to Tecton. We can do this using the Tecton CLI:
$ tecton apply
Using workspace "prod"
✅ Imported 15 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side validation of feature declarations
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
+ Create BatchDataSource
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]>
y to apply this definition to Tecton.
Testing the Data Source in a Notebook
To verify that the data sources are connected properly, use the Tecton SDK in a notebook environment:
users_batch = tecton.get_workspace('my_workspace').get_data_source('users_batch')
With a Data Source defined and verified, you are now ready to define Tecton Feature Views that make use of this data.