Skip to main content
Version: 0.4

Creating and Testing a Batch Data Source

Overview​

Tecton supports connections to many different data sources. This example uses a Hive table for batch data, but the same principles apply for any raw data source, including streams. See Data Sources overview or the Data Source API more more details.

You must register a data source with Tecton before you define features based on that data. To register a data source, follow these steps:

  1. Define a data source object.
  2. Apply your data source to Tecton using the Tecton CLI.
  3. Test the data source by querying it in a notebook.

This guide assume you've already set up the permissions required for Tecton to read from the source.

Creating a Batch Data Source​

In this example, we define a BatchSource that contains the configuration necessary for Tecton to access our Hive user table.

Create a new file in your feature repository, and paste in the following code:

from tecton import HiveConfig, BatchSource

fraud_users_batch = BatchSource(
name="users_batch",
batch_config=HiveConfig(database="fraud", table="fraud_users"),
owner="matt@tecton.ai",
tags={"release": "production"},
)

In the example definition above, we also added metadata parameters for organization, such as name and tags.

Applying the Data Source​

So far, all we've done is written code in our local feature repository. In order to use the data source in Tecton, we need to apply our new definition to Tecton. We can do this using the Tecton CLI:

$ tecton apply
Using workspace "prod"
✅ Imported 15 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side validation of feature declarations
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

+ Create BatchDataSource
name: users_batch

↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]>

Enter y to apply this definition to Tecton.

Testing the Data Source in a Notebook​

To verify that the data sources are connected properly, use the Tecton SDK in a notebook environment:

import tecton
users_batch = tecton.get_workspace('my_workspace').get_data_source('users_batch')

print(users_batch.get_dataframe().to_pandas().head(10))

With a Data Source defined and verified, you are now ready to define Tecton Feature Views that make use of this data.

Was this page helpful?

Happy React is loading...