DatabricksJsonClusterConfig
Summaryโ
Configuration used to specify materialization clusters using json on Databricks.ย
This class describes the attributes of the new clusters which are created in Databricks during materialization jobs. Please find more details in User Guide
Attributesโ
The attributes are the same as the __init__ method parameters. See below.
Methodsโ
__init__(...)โ
Parameters
kind: Literal['DatabricksJsonClusterConfig'] = DatabricksJsonClusterConfigjson: Dict[str,Any] = NoneA JSON string used to directly configure the cluster used in materialization.
Required Fieldsโ
Tecton uses the
Runs Submit
endpoint of the Databricks Jobs API 2.0 to programmatically create and manage
materialization jobs that run on Databricks. When a
DatabricksJsonClusterConfig object is instantiated, a JSON string that
conforms to the Runs Submit
request schema
must provided.
{
"new_cluster": {
"num_workers": 0,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"aws_attributes": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100,
"instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
"availability": "SPOT",
"zone_id": "auto"
},
"spark_conf": {
#Required for Glue usage
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.databricks.hive.metastore.glueCatalog.enabled": "true"
},
}
}
Example Databricks Cluster Configurationโ
The following example includes all of the required parameters as well as some additional common parameters.
{
"new_cluster": {
"num_workers": 0,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"policy_id": "your_policy_id",
"aws_attributes": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100,
"instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
"availability": "SPOT",
"zone_id": "auto"
},
"custom_tags": [
{
"key": "test_key",
"value": "test_value"
}
],
"spark_conf": {
"spark.databricks.service.server.enabled": "true",
"spark.hadoop.fs.s3a.acl.default": "BucketOwnerFullControl",
"spark.sql.sources.partitionOverwriteMode": "dynamic",
"spark.databricks.cluster.profile": "singleNode",
"spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED",
"spark.master": "local[*]",
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.databricks.hive.metastore.glueCatalog.enabled": "true"
},
"spark_env_vars": {
"TEST_ENV_VAR": "IGNORE_ME"
}
},
"libraries": [
{
"jar": "s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
}
]
}