Databricks Unity Catalog
Prerequisitesβ
- Tecton SDK 0.6+
- DBR 11+ (with the Premium plan or above)
Limitationsβ
- Tecton is currently compatible with the
SINGLE USER
Β Databricks cluster access mode, but not yet withSHARED MODE
. - In order for your Tecton notebook to be able to read directly from Unity
Catalog data sources (e.g. to run
FeatureView.get_historical_features(from_source=True)
), you must create your notebook cluster with theSINGLE USER
access mode. This means each Databricks user will need a separate notebook cluster.
Databricks & AWS Setupβ
- Assign your Databricks workspaces used by Tecton to the metastore that you plan to use.
- Add the Databricks Service Principal used by Tecton as users of the metastore.
- For the S3 bucket you configured as the Tecton offline store, make sure all AWS IAM requirements here are also met and this IAM role ARN is registered with storage credentials in Unity Catalog via Databricks Data Explorer.
- Create an external location for this S3 bucket with the above storage
credential and grant the Databricks account used by Tecton at least the
READ FILES
andWRITE FILES
permissions. This can be done by running the following SQL commands in a notebook or the Databricks SQL editor which is backed by a Unity-enabled cluster or SQL warehouse.CREATE EXTERNAL LOCATION [IF NOT EXISTS] <location_name>
URL 's3://<bucket_path>'
WITH ([STORAGE] CREDENTIAL <storage_credential_name>)
[COMMENT <comment_string>];
GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <tecton_databricks_account>;
GRANT WRITE FILES ON EXTERNAL LOCATION <location_name> TO <tecton_databricks_account>;
Configuring Tecton Data Sources & Feature Views to work with Unityβ
- Please let Tecton know that you plan to use Unity Catalog, so that we can appropriately configure internal Spark clusters used by Tecton's SDK.
- No changes are needed for Feature Views that donβt use a Unity data source.
- Please note that changing a Feature View's Data Source may result in re-materialization.
Tecton SDK Version 0.7+β
- We recommend using
UnityConfig
as follows:test_unity_batch_source = BatchSource(
name="test_unity_config_batch_source",
batch_config=UnityConfig(
catalog="main", # <catalog_name>
schema="default", # <schema_name>
table="department", # <table_name>
),
)