Skip to main content
Version: 0.6

Connecting Databricks Notebooks

You can use the Tecton SDK in a Databricks notebook to explore feature values and create training datasets. The following guide covers how to configure your all-purpose cluster for use with Tecton. If you haven't already completed your deployment of Tecton with Databricks, please see the guide for Configuring Databricks.

Supported Databricks runtimes for notebooks

Tecton supports using the Tecton SDK with the following Databricks Runtime Versions:

As a best practice, use the same version for your Notebook Cluster as is configured for your Feature View materialization. DBR 10.4 LTS is the default DBR for materialization jobs.

Install the Tecton SDK

This step must be done once per notebook cluster.

On the cluster configuration page:

  1. Go to the Libraries tab
  2. Click Install New
  3. Select PyPI under Library Source
  4. Set Package to your desired Tecton SDK version, such as tecton==0.6.0 or tecton==0.6.*.

Install the Tecton UDF Jar

This step must be done once per notebook cluster.

On the cluster configuration page:

  1. Go to the Libraries tab
  2. Click Install New
  3. Select DBFS/S3 under Library Source
  4. Set File Path to s3://tecton.ai.public/pip-repository/itorgation/tecton/{tecton_version}/tecton-udfs-spark-3.jar where tecton_version matches the SDK version you installed, such as 0.6.0 or 0.6.* to get the jar that matches the latest patch.

Configure SDK credentials

Create a Tecton Service Account

Using the CLI

Databricks Notebooks need an API key to connect to Tecton. Create a Service Account to obtain an API key.

tecton service-account create \
--name "notebook-service-acount" \
--description "The Service Account for our Databricks notebooks"

Output:

Save this API Key - you will not be able to get it again.
API Key: <Your-api-key>
Service Account ID: <Your-Service-Account-Id>

This API key will be configured for your notebook below.

In order to access objects from a given Tecton workspace, the Service Account used by your notebook needs to have at least the Viewer role. You may want to grant the Consumer role to enable testing Online Feature Retrieval.

tecton access-control assign-role --role consumer \
--workspace <Your-workspace> \
--service-account <Your-Service-Account-Id>

Output:

Successfully updated role.

[Optional] You can also use CLI version 0.6.6 or newer to grant the Service Account the role across all workspaces:

tecton access-control assign-role --role consumer \
--service-account <Your-Service-Account-Id>

When new workspaces are created, you will automatically be able to access objects from that workspace in your notebooks.

Using the Web UI

Alternatively, follow these steps in the Tecton Web UI to set up your notebook Service Account:

  1. Locate your workspace by selecting it from the drop down list at the top.
  2. On the left navigation bar, select Permissions.
  3. Select the Service Accounts tab.
  4. Click Add service account to ...
  5. In the dialog box that appears, search for the Service Account name.
  6. When the workspace name appears, click Select on the right.
  7. Select a role. You can select any of these roles: Owner, Editor, Consumer, or Viewer.
  8. Click Confirm.

Using Databricks Secret Scopes

Tecton SDK credentials can configured using Databricks secrets. This should be pre-configured with the Tecton deployment, but if needed they can be created in the following format (such as if you wanted to access Tecton from another Databricks workspace). First, ensure the Databricks CLI is installed and configured. Next, create a secret scope and configure endpoints and API tokens using the Token created above in Prerequisites:.

Naming the Secret Scope

The secret scope name is derived from the cluster name:

  • <deployment-name>, if your deployment name begins with tecton
  • tecton-<deployment-name>, otherwise

<deployment-name> is the first part of the URL used to access the Tecton UI: https://<deployment-name>.tecton.ai

If the above doesn't work, verify that your cluster name is set using

tecton.conf.get_or_raise("TECTON_CLUSTER_NAME")
# if not set, run tecton.conf.set("TECTON_CLUSTER_NAME", <deployment-name>)

Then check what secret scopes the cluster can read from:

tecton.conf._get_secret_scopes()

This should show 2 secret scopes, the one derived from the cluster name, and one called tecton. The tecton scope is a fallback if the first scope is not present or populated, so make sure to create the secret scope with the correct name.

Populating the secret scope

The secret scope needs to be populated with secrets:

databricks secrets create-scope --scope <scope_name>
databricks secrets put --scope <scope_name> \
--key API_SERVICE --string-value https://foo.tecton.ai/api
databricks secrets put --scope <scope_name> \
--key TECTON_API_KEY --string-value <TOKEN>

Depending on your Databricks setup, you may need to configure ACLs for the secret scope before it is usable. See Databricks documentation for more information. For example:

databricks secrets put-acl --scope <scope_name> \
--principal your@email.com --permission MANAGE

Additionally, depending on data sources used, you may need to configure the following:

  • <secret-scope>/REDSHIFT_USER
  • <secret-scope>/REDSHIFT_PASSWORD
  • <secret-scope>/SNOWFLAKE_USER
  • <secret-scope>/SNOWFLAKE_PASSWORD

Using notebook-scoped credentials

Tecton SDK credentials can also be configured within the scope of a Python session using the tecton.set_credentials() method.

Using the API key created earlier, run the following in your notebook:

import tecton

tecton.set_credentials(tecton_api_token="<token>", tecton_url="https://<deployment name>.tecton.ai/api")

Credentials configured using tecton.set_credentials() are scoped to the notebook session. They will need to reconfigured whenever a notebook is restarted or state is cleared. To read SDK credentials from the environment, it is recommended to use the method above in Using Databricks Secrets.

Configure permissions for cross-account access

If your Databricks workspace is in a different AWS account from your Tecton dataplane, you must configure AWS access so that Databricks can read all of the S3 buckets Tecton uses (which are in the data plane account, and are prefixed with tecton-), as well as access to the underlying data sources Tecton reads in order to have full functionality.

Verify the connection

Create a notebook connected to a cluster with the Tecton SDK installed (see Step 1). Run the following in the notebook. If successful, you should see a list of workspaces.

import tecton

tecton.test_credentials()