Connecting to an Existing Databricks Account
Databricks is a hosted Spark platform that can be used by Tecton for compute workloads and notebook environments.
Tecton creates and manages all Databricks resources automatically. If you are already a Databricks customer, you can use your existing Databricks Workspace.
Deployment Set Up
To set up Tecton with a Databricks account, you'll need to provide the following:
- A Databricks Workspace deployed on AWS.
- Databricks Workspace URL. The Workspace URL is the URL used to access Databricks with a format similar to
- AWS VPC ID used by the Databricks Workspace.
At the moment, connecting your Tecton deployment to an existing Databricks instance must be done with the help of Tecton support since it requires updating Tecton-managed AWS resources.
Interactive Cluster Set Up
Follow these steps to set up an interactive Databricks cluster once your Databricks instance is connected to a Tecton deployment.
You'll need a Tecton API key. This can be obtained using the CLI by running
$ tecton create-api-key Save this key - you will not be able get it again 1234567890abcdefabcdefabcdefabcd
1. Install Tecton SDK as a library
This must be done once per Cluster. In the Cluster configuration page:
- Go to the Libraries tab
- Click Install New
- Select PyPI under Library Source
- Set Package to
2. Configure SDK credentials using secrets
Tecton SDK credentials are configured using Databricks secrets. This should be pre-configured with the Tecton deployment, but if needed they can be created in the following format (such as if you wanted to access Tecton from another Databricks workspace). First, ensure the Databricks CLI is installed and configured. Next, create a secret scope and configure endpoints and API tokens using the Token created above in Prerequisites:. The scope name is
tecton for the production Tecton cluster associated with a workspace, and
tecton-<clustername> otherwise (such as a staging cluster created in the same account). Note that if your cluster name starts with
tecton- already, the prefix would merely be your cluster name.
databricks secrets create-scope --scope <scopename> databricks secrets put --scope <scopename> \ --key API_SERVICE --string-value https://foo.tecton.ai/api databricks secrets put --scope <scopename> \ --key TECTON_API_KEY --string-value <TOKEN>
Depending on your Databricks setup, you may need to configure ACLs for the
tecton secret scope before it is usable. See Databricks documentation for more information. For example:
databricks secrets put-acl --scope <scopename> \ --principal email@example.com --permission MANAGE
Additionally, depending on data sources used, you may need to configure the following.
3. Additional Permissions
Additionally, if your Databricks workspace is on a different AWS account, you must make sure to configure AWS access so that Databricks can read all of the S3 buckets Tecton uses (which are in the data plane account, and are prefixed with
tecton-), as well as access to the underlying data sources Tecton reads in order to have full functionality.
4. Verify the connection
Create a notebook connected to a cluster with the Tecton SDK installed (see Step 1). Run the following in the notebook. If successful, you should see a list of workspaces, including the
import tecton tecton.list_workspaces()