Skip to main content
Version: Beta 🚧

Configure Databricks Service Principals per Workspace

Private Preview

This feature is currently in Private Preview.

This feature has the following limitations:
  • Available for Tecton on Databricks.
If you would like to participate in the preview, please file a support ticket.

When Tecton is configured to use Databricks, Tecton submits materialization jobs to Databricks using the Databricks API Token configured during setup. The Databricks API Token's principal is the identity that Databricks runs the jobs as. But it may be desirable to use a different Databricks principal for running materialization jobs. This document explains how to configure which Databricks service principals to use.

Each live Tecton workspace supports:

  • A default Databricks service principal
  • A list of allowed service principals
  • Per-feature-view service principal assignment from the workspace's allow list to use a different one than the default.

Setup and Configuration​

1. Configuring the Default Service Principal and Allowed List​

First, retrieve your current workspace definition:

curl https://<YOUR-TECTON-CLUSTER>.tecton.ai/api/v1/workspaces/<YOUR-WORKSPACE>

Then, update the returned workspace and set the compute_identities field. The compute_identities field can contain zero or more entries. databricks_service_principal.application_id is the application_id (it's a UUID) of the Databricks service principal:

curl -X PUT https://<YOUR-TECTON-CLUSTER>.tecton.ai/api/v1/workspaces/<YOUR-WORKSPACE>\
-H "Content-type: application/json"\
-H "Authorization: Tecton-key <YOUR-API-KEY>"\
-d '{
"capabilities":{
"materializable":true,
"offline_store_subdirectory_enabled":true
},
"compute_identities":[
{
"databricks_service_principal":{
"application_id":"<DEFAULT-DATABRICKS-SERVICE-PRINCIPAL-ID>"
}
},
{
"databricks_service_principal":{
"application_id":"<ANOTHER-ALLOWED-DATABRICKS-SERVICE-PRINCIPAL-ID>"
}
}
]
}'
note
  • The first Databricks service principal in the list is the workspace's default.
  • When the list is empty, the Databricks API Token's principal is used.
  • Only Admins can update workspaces' list of allowed Databricks service principals.
note

The Databricks API Token's principal will need to have the servicePrincipal/user role for each service account included in the list, otherwise attempts to submit jobs to run as the service principal will fail. See Databricks' Manage Service Principals documentation.

Additionally, the Databricks Service Principal will need to be allowed to use the instance profile configured for materialization. The instance profile used will either be the default for your tecton instance (shown in the Compute tab in the Tecton Web UI), or defined in the DatabricksJsonClusterConfig for the feature view (if set).

2. Configuring a Feature View to Use a Specific Databricks Service Principal​

When a specific Databricks Service Principal should be used (rather than the workspace's default), configure the feature view with a Databricks Json Cluster Config (DJCC) and specify the service principal in the run_as parameter.

Databricks' Jobs 2.1 API includes a run_as parameter allowing the Service Principal's application ID. Create the DJCC following Databricks' Jobs 2.1 format. Specify the Databricks service principal's application_id to use in the run_as parameter. Include in the feature view's options field a DATABRICKS_JOBS_API_VERSION property with string value 2.1 so Tecton knows to use Databricks' Jobs 2.1 API (otherwise the 2.0 jobs API is assumed).

Below is an example:

from tecton import batch_feature_view
from tecton.framework.configs import DatabricksJsonClusterConfig
import json

DJCC = {
"run_as": {"service_principal_name": "<DATABRICKS-SERVICE-PRINCIPAL-ID>"},
"tasks": [
{
"task_key": "tecton_materialization",
"new_cluster": {
"num_workers": 2,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"aws_attributes": {...},
"spark_conf": {...},
# ...
},
}
],
}


@batch_feature_view(
batch_compute=DatabricksJsonClusterConfig(json=json.dumps(DJCC)),
options={"DATABRICKS_JOBS_API_VERSION": "2.1"},
# ...
)
def my_feature_view():
# my feature logic
pass

Databricks API Token's principal Usage​

There are a couple ways that materialization jobs would still be run as the Databricks API Token's principal:

  1. Jobs for feature views in workspaces where no databricks service principals are set on the workspace (e.g. the compute identities list is empty).
  2. Jobs for feature views that are configured with a Databricks Json Cluster Compute where the databricks jobs api version is either not specified or not set to 2.1.

Recommendations​

  • Update all feature views using Databricks Json Cluster Config to use databrick's Jobs 2.1 API format (see https://docs.databricks.com/en/reference/jobs-api-2-1-updates.html).
  • Create Databricks service principals with just the minimum set of permissions for various materialization jobs and configure them in each live workspace's allow list.
  • Once all live workspaces are configured with (and using) default databricks service principals (and are no longer using the Databricks API Token's principal in any DJCC), consider reducing the data access permissions of the Databricks API Token's principal.

Was this page helpful?