DatabricksJsonClusterConfig
Summary​
Configuration used to specify materialization clusters using json on Databricks.Â
This class describes the attributes of the new clusters which are created in Databricks during materialization jobs. Please find more details in User Guide
Attributes​
The attributes are the same as the __init__ method parameters. See below.
Methods​
__init__(...)​
Parameters
kind: Literal['DatabricksJsonClusterConfig'] = DatabricksJsonClusterConfigjson: Dict[str,Any] = NoneA JSON string used to directly configure the cluster used in materialization.
Required Fields​
Tecton uses the
Runs Submit
endpoint of the Databricks Jobs API 2.0 to programmatically create and manage
materialization jobs that run on Databricks. When a
DatabricksJsonClusterConfig object is instantiated, a JSON string that
conforms to the Runs Submit
request schema
must provided.
{
"new_cluster": {
"num_workers": 0,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"aws_attributes": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100,
"instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
"availability": "SPOT",
"zone_id": "auto"
},
"spark_conf": {
#Required for Glue usage
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.databricks.hive.metastore.glueCatalog.enabled": "true"
},
}
}
Example Databricks Cluster Configuration​
The following example includes all of the required parameters as well as some additional common parameters.
{
"new_cluster": {
"num_workers": 0,
"spark_version": "11.3.x-scala2.12",
"node_type_id": "m5.large",
"policy_id": "your_policy_id",
"aws_attributes": {
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100,
"instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
"availability": "SPOT",
"zone_id": "auto"
},
"custom_tags": [
{
"key": "test_key",
"value": "test_value"
}
],
"spark_conf": {
"spark.databricks.service.server.enabled": "true",
"spark.hadoop.fs.s3a.acl.default": "BucketOwnerFullControl",
"spark.sql.sources.partitionOverwriteMode": "dynamic",
"spark.databricks.cluster.profile": "singleNode",
"spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED",
"spark.master": "local[*]",
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.databricks.hive.metastore.glueCatalog.enabled": "true"
},
"spark_env_vars": {
"TEST_ENV_VAR": "IGNORE_ME"
}
},
"libraries": [
{
"jar": "s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
}
]
}