Version: Beta 🚧

DatabricksJsonClusterConfig

Summary

Configuration used to specify materialization clusters using json on Databricks.

This class describes the attributes of the new clusters which are created in Databricks during materialization jobs. Please find more details in User Guide

Attributes

The attributes are the same as the __init__ method parameters. See below.

Methods

init(...)

Parameters

kind: Literal['DatabricksJsonClusterConfig'] = DatabricksJsonClusterConfig
json: Dict[str,Any] = None A JSON string used to directly configure the cluster used in materialization.

Required Fields

Tecton uses the Runs Submit endpoint of the Databricks Jobs API 2.0 to programmatically create and manage materialization jobs that run on Databricks. When a DatabricksJsonClusterConfig object is instantiated, a JSON string that conforms to the Runs Submit request schema must provided.

{
    "new_cluster": {
      "num_workers": 0,
      "spark_version": "11.3.x-scala2.12",
      "node_type_id": "m5.large",
      "aws_attributes": {
        "ebs_volume_type": "GENERAL_PURPOSE_SSD",
        "ebs_volume_count": 1,
        "ebs_volume_size": 100,
        "instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
        "availability": "SPOT",
        "zone_id": "auto"
      },
      "spark_conf": {
        #Required for Glue usage
        "spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
        "spark.databricks.hive.metastore.glueCatalog.enabled": "true"
      },
  }
}

Example Databricks Cluster Configuration

The following example includes all of the required parameters as well as some additional common parameters.

{
  "new_cluster": {
    "num_workers": 0,
    "spark_version": "11.3.x-scala2.12",
    "node_type_id": "m5.large",
    "policy_id": "your_policy_id",
    "aws_attributes": {
      "ebs_volume_type": "GENERAL_PURPOSE_SSD",
      "ebs_volume_count": 1,
      "ebs_volume_size": 100,
      "instance_profile_arn": "arn:aws:iam::your_account_id:instance-profile/your-role",
      "availability": "SPOT",
      "zone_id": "auto"
    },
    "custom_tags": [
      {
        "key": "test_key",
        "value": "test_value"
      }
    ],
    "spark_conf": {
      "spark.databricks.service.server.enabled": "true",
      "spark.hadoop.fs.s3a.acl.default": "BucketOwnerFullControl",
      "spark.sql.sources.partitionOverwriteMode": "dynamic",
      "spark.databricks.cluster.profile": "singleNode",
      "spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
      "spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
      "spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED",
      "spark.master": "local[*]",
      "spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
      "spark.databricks.hive.metastore.glueCatalog.enabled": "true"
    },
    "spark_env_vars": {
      "TEST_ENV_VAR": "IGNORE_ME"
    }
  },
  "libraries": [
    {
      "jar": "s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
    }
  ]
}

Summary​

Attributes​

Methods​

__init__(...)​