Skip to main content
Version: Beta ๐Ÿšง

DatabricksJsonClusterConfig

Summaryโ€‹

Configuration used to specify materialization clusters using json on Databricks.
ย 
This class describes the attributes of the new clusters which are created in Databricks during materialization jobs. Please find more details in User Guide

Attributesโ€‹

The attributes are the same as the __init__ method parameters. See below.

Methodsโ€‹

__init__(...)โ€‹

Parameters

  • kind: Literal['DatabricksJsonClusterConfig'] = DatabricksJsonClusterConfig
  • json: Dict[str,Any] = None A JSON string used to directly configure the cluster used in materialization.

Required Fieldsโ€‹

Tecton uses the Runs Submit endpoint of the Databricks Jobs API 2.0 to programmatically create and manage materialization jobs that run on Databricks. When a DatabricksJsonClusterConfig object is instantiated, a JSON string that conforms to the Runs Submit request schema must provided.

{
"new_cluster": {
"num_workers": 0,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"aws_attributes": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100,
"instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
"availability": "SPOT",
"zone_id": "auto"
},
"spark_conf": {
#Required for Glue usage
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.databricks.hive.metastore.glueCatalog.enabled": "true"
},
}
}

Example Databricks Cluster Configurationโ€‹

The following example includes all of the required parameters as well as some additional common parameters.

{
"new_cluster": {
"num_workers": 0,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"policy_id": "your_policy_id",
"aws_attributes": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100,
"instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
"availability": "SPOT",
"zone_id": "auto"
},
"custom_tags": [
{
"key": "test_key",
"value": "test_value"
}
],
"spark_conf": {
"spark.databricks.service.server.enabled": "true",
"spark.hadoop.fs.s3a.acl.default": "BucketOwnerFullControl",
"spark.sql.sources.partitionOverwriteMode": "dynamic",
"spark.databricks.cluster.profile": "singleNode",
"spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED",
"spark.master": "local[*]",
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.databricks.hive.metastore.glueCatalog.enabled": "true"
},
"spark_env_vars": {
"TEST_ENV_VAR": "IGNORE_ME"
}
},
"libraries": [
{
"jar": "s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
}
]
}

Was this page helpful?