Skip to main content
Version: 1.1

EMRJsonClusterConfig

Summary​

Configuration used to specify materialization clusters using json on EMR.
 
This class describes the attributes of the new clusters which are created in EMR during materialization jobs. Please find more details in this User Guide

Attributes​

The attributes are the same as the __init__ method parameters. See below.

Methods​

__init__(...)​

Parameters

  • kind: Literal['EMRJsonClusterConfig'] = EMRJsonClusterConfig
  • json: Dict[str,Any] = None A JSON string used to directly configure the cluster used in materialization.

Required Fields​

Tecton uses the RunJobFlow request action of the EMR API to start a new EMR cluster. When you instantiate an EMRJsonClusterConfig object, you must provide a JSON string that conforms with the RunJobFlow request schema.

The following are required parameters:

  • ReleaseLabel
  • Instances
  • ServiceRole
  • JobFlowRole

These parameters are explained in the EMR RunJobFlow documentation here.

Example EMR Cluster Configuration​

The following example includes all of the required parameters as well as some additional common parameters.

{
"ReleaseLabel": "emr-6.7.0",
"ServiceRole": "your-service-role",
"JobFlowRole": "your-job-flow-role",
"CustomAmiId": "your-custom-AMI-ID",
"Instances": {
"Ec2SubnetIds": ["subnet-your_net_a", "subnet-your_net_b"],
"EmrManagedMasterSecurityGroup": "sg-your_group_a",
"EmrManagedSlaveSecurityGroup": "sg-your_group_b",
"ServiceAccessSecurityGroup": "sg-your_group_c",
"AdditionalMasterSecurityGroups": [],
"InstanceFleets": [
{
"InstanceFleetType": "CORE",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [{"InstanceType": "m4.large"}],
},
{
"InstanceFleetType": "MASTER",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [{"InstanceType": "m4.large"}],
},
],
},
"Configurations": [
{
"Classification": "spark-defaults",
"Configurations": [],
"Properties": {
"spark.driver.maxResultSize": "4G",
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,org.apache.spark.sql.hudi.HoodieSparkSessionExtension",
"spark.sql.catalogImplementation": "hive",
"spark.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.hive.metastore.glue.catalogid": "your_account_id",
"spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED"
}
},
{
"Classification": "hive-site",
"Configurations": [],
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"hive.metastore.glue.catalogid": "your_account_id"
}
},
{
"Classification": "spark-hive-site",
"Configurations": [],
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"hive.metastore.glue.catalogid": "your_account_id"
}
},
{
"Classification": "yarn-env",
"Configurations": [
{
"Classification": "export",
"Configurations": [],
"Properties": {
"TEST_ENV_VAR": "TEST_VALUE"
}
}
],
"Properties": {}
}
],
"BootstrapActions": [
{
"Name": "install_jars_from_s3",
"ScriptBootstrapAction": {
"Path": "s3://tecton-cluster/path/install_jars_from_s3.sh",
"Args": [
"s3://tecton.ai.public/jars/spark-snowflake_2.12-2.9.1-spark_3.0.jar",
"s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
]
}
},
],
}

Was this page helpful?