Skip to main content
Version: Beta ๐Ÿšง

EMRJsonClusterConfig

Summaryโ€‹

Configuration used to specify materialization clusters using json on EMR.
ย 
This class describes the attributes of the new clusters which are created in EMR during materialization jobs. Please find more details in this User Guide

Attributesโ€‹

The attributes are the same as the __init__ method parameters. See below.

Methodsโ€‹

__init__(...)โ€‹

Parameters

  • kind: Literal['EMRJsonClusterConfig'] = EMRJsonClusterConfig
  • json: Dict[str,Any] = None A JSON string used to directly configure the cluster used in materialization.

Required Fieldsโ€‹

Tecton uses the RunJobFlow request action of the EMR API to start a new EMR cluster. When you instantiate an EMRJsonClusterConfig object, you must provide a JSON string that conforms with the RunJobFlow request schema.

The following are required parameters:

  • ReleaseLabel
  • Instances
  • ServiceRole
  • JobFlowRole

These parameters are explained in the EMR RunJobFlow documentation here.

Example EMR Cluster Configurationโ€‹

The following example includes all of the required parameters as well as some additional common parameters.

{
"ReleaseLabel": "emr-6.7.0",
"ServiceRole": "your-service-role",
"JobFlowRole": "your-job-flow-role",
"CustomAmiId": "your-custom-AMI-ID",
"Instances": {
"Ec2SubnetIds": ["subnet-your_net_a", "subnet-your_net_b"],
"EmrManagedMasterSecurityGroup": "sg-your_group_a",
"EmrManagedSlaveSecurityGroup": "sg-your_group_b",
"ServiceAccessSecurityGroup": "sg-your_group_c",
"AdditionalMasterSecurityGroups": [],
"InstanceFleets": [
{
"InstanceFleetType": "CORE",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [{"InstanceType": "m4.large"}],
},
{
"InstanceFleetType": "MASTER",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [{"InstanceType": "m4.large"}],
},
],
},
"Configurations": [
{
"Classification": "spark-defaults",
"Configurations": [],
"Properties": {
"spark.driver.maxResultSize": "4G",
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,org.apache.spark.sql.hudi.HoodieSparkSessionExtension",
"spark.sql.catalogImplementation": "hive",
"spark.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.hive.metastore.glue.catalogid": "your_account_id",
"spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED"
}
},
{
"Classification": "hive-site",
"Configurations": [],
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"hive.metastore.glue.catalogid": "your_account_id"
}
},
{
"Classification": "spark-hive-site",
"Configurations": [],
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"hive.metastore.glue.catalogid": "your_account_id"
}
},
{
"Classification": "yarn-env",
"Configurations": [
{
"Classification": "export",
"Configurations": [],
"Properties": {
"TEST_ENV_VAR": "TEST_VALUE"
}
}
],
"Properties": {}
}
],
"BootstrapActions": [
{
"Name": "install_jars_from_s3",
"ScriptBootstrapAction": {
"Path": "s3://tecton-cluster/path/install_jars_from_s3.sh",
"Args": [
"s3://tecton.ai.public/jars/spark-snowflake_2.12-2.9.1-spark_3.0.jar",
"s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
]
}
},
],
}

Was this page helpful?