Version: 1.1

EMRJsonClusterConfig

Summary

Configuration used to specify materialization clusters using json on EMR.

This class describes the attributes of the new clusters which are created in EMR during materialization jobs. Please find more details in this User Guide

Attributes

The attributes are the same as the __init__ method parameters. See below.

Methods

init(...)

Parameters

kind: Literal['EMRJsonClusterConfig'] = EMRJsonClusterConfig
json: Dict[str,Any] = None A JSON string used to directly configure the cluster used in materialization.

Required Fields

Tecton uses the RunJobFlow request action of the EMR API to start a new EMR cluster. When you instantiate an EMRJsonClusterConfig object, you must provide a JSON string that conforms with the RunJobFlow request schema.

The following are required parameters:

ReleaseLabel
Instances
ServiceRole
JobFlowRole

These parameters are explained in the EMR RunJobFlow documentation here.

Example EMR Cluster Configuration

The following example includes all of the required parameters as well as some additional common parameters.

{
    "ReleaseLabel": "emr-6.7.0",
    "ServiceRole": "your-service-role",
    "JobFlowRole": "your-job-flow-role",
    "CustomAmiId": "your-custom-AMI-ID",
    "Instances": {
        "Ec2SubnetIds": ["subnet-your_net_a", "subnet-your_net_b"],
        "EmrManagedMasterSecurityGroup": "sg-your_group_a",
        "EmrManagedSlaveSecurityGroup": "sg-your_group_b",
        "ServiceAccessSecurityGroup": "sg-your_group_c",
        "AdditionalMasterSecurityGroups": [],
        "InstanceFleets": [
            {
                "InstanceFleetType": "CORE",
                "TargetSpotCapacity": 1,
                "InstanceTypeConfigs": [{"InstanceType": "m4.large"}],
            },
            {
                "InstanceFleetType": "MASTER",
                "TargetSpotCapacity": 1,
                "InstanceTypeConfigs": [{"InstanceType": "m4.large"}],
            },
        ],
    },
    "Configurations": [
      {
        "Classification": "spark-defaults",
        "Configurations": [],
        "Properties": {
          "spark.driver.maxResultSize": "4G",
          "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,org.apache.spark.sql.hudi.HoodieSparkSessionExtension",
          "spark.sql.catalogImplementation": "hive",
          "spark.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
          "spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
          "spark.hive.metastore.glue.catalogid": "your_account_id",
          "spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
          "spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
          "spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED"
        }
      },
      {
        "Classification": "hive-site",
        "Configurations": [],
        "Properties": {
          "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
          "hive.metastore.glue.catalogid": "your_account_id"
        }
      },
      {
        "Classification": "spark-hive-site",
        "Configurations": [],
        "Properties": {
          "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
          "hive.metastore.glue.catalogid": "your_account_id"
        }
      },
      {
        "Classification": "yarn-env",
        "Configurations": [
          {
            "Classification": "export",
            "Configurations": [],
            "Properties": {
              "TEST_ENV_VAR": "TEST_VALUE"
            }
          }
        ],
        "Properties": {}
      }
    ],
    "BootstrapActions": [
      {
        "Name": "install_jars_from_s3",
        "ScriptBootstrapAction": {
          "Path": "s3://tecton-cluster/path/install_jars_from_s3.sh",
          "Args": [
            "s3://tecton.ai.public/jars/spark-snowflake_2.12-2.9.1-spark_3.0.jar",
            "s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
          ]
        }
      },
    ],
  }

Summary​

Attributes​

Methods​

__init__(...)​