Skip to main content
Version: 1.1

DatabricksJsonClusterConfig

Summary​

Configuration used to specify materialization clusters using json on Databricks.
 
This class describes the attributes of the new clusters which are created in Databricks during materialization jobs. Please find more details in User Guide

Attributes​

The attributes are the same as the __init__ method parameters. See below.

Methods​

__init__(...)​

Parameters

  • kind: Literal['DatabricksJsonClusterConfig'] = DatabricksJsonClusterConfig
  • json: Dict[str,Any] = None A JSON string used to directly configure the cluster used in materialization.

Required Fields​

Tecton uses the Runs Submit endpoint of the Databricks Jobs API 2.0 to programmatically create and manage materialization jobs that run on Databricks. When a DatabricksJsonClusterConfig object is instantiated, a JSON string that conforms to the Runs Submit request schema must provided.

{
"new_cluster": {
"num_workers": 0,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"aws_attributes": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100,
"instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
"availability": "SPOT",
"zone_id": "auto"
},
"spark_conf": {
#Required for Glue usage
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.databricks.hive.metastore.glueCatalog.enabled": "true"
},
}
}

Example Databricks Cluster Configuration​

The following example includes all of the required parameters as well as some additional common parameters.

{
"new_cluster": {
"num_workers": 0,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"policy_id": "your_policy_id",
"aws_attributes": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100,
"instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
"availability": "SPOT",
"zone_id": "auto"
},
"custom_tags": [
{
"key": "test_key",
"value": "test_value"
}
],
"spark_conf": {
"spark.databricks.service.server.enabled": "true",
"spark.hadoop.fs.s3a.acl.default": "BucketOwnerFullControl",
"spark.sql.sources.partitionOverwriteMode": "dynamic",
"spark.databricks.cluster.profile": "singleNode",
"spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED",
"spark.master": "local[*]",
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.databricks.hive.metastore.glueCatalog.enabled": "true"
},
"spark_env_vars": {
"TEST_ENV_VAR": "IGNORE_ME"
}
},
"libraries": [
{
"jar": "s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
}
]
}

Was this page helpful?