EMRJsonClusterConfig
Summaryโ
Configuration used to specify materialization clusters using json on EMR.ย
This class describes the attributes of the new clusters which are created in EMR during materialization jobs. Please find more details in this User Guide
Attributesโ
The attributes are the same as the __init__ method parameters. See below.
Methodsโ
__init__(...)โ
Parameters
kind: Literal['EMRJsonClusterConfig'] = EMRJsonClusterConfigjson: Dict[str,Any] = NoneA JSON string used to directly configure the cluster used in materialization.
Required Fieldsโ
Tecton uses the RunJobFlow request action of the EMR API to start a new EMR
cluster. When you instantiate an EMRJsonClusterConfig object, you must provide
a JSON string that conforms with the
RunJobFlow request schema.
The following are required parameters:
ReleaseLabelInstancesServiceRoleJobFlowRole
These parameters are explained in the EMR RunJobFlow documentation here.
Example EMR Cluster Configurationโ
The following example includes all of the required parameters as well as some additional common parameters.
{
"ReleaseLabel": "emr-6.7.0",
"ServiceRole": "your-service-role",
"JobFlowRole": "your-job-flow-role",
"CustomAmiId": "your-custom-AMI-ID",
"Instances": {
"Ec2SubnetIds": ["subnet-your_net_a", "subnet-your_net_b"],
"EmrManagedMasterSecurityGroup": "sg-your_group_a",
"EmrManagedSlaveSecurityGroup": "sg-your_group_b",
"ServiceAccessSecurityGroup": "sg-your_group_c",
"AdditionalMasterSecurityGroups": [],
"InstanceFleets": [
{
"InstanceFleetType": "CORE",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [{"InstanceType": "m4.large"}],
},
{
"InstanceFleetType": "MASTER",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [{"InstanceType": "m4.large"}],
},
],
},
"Configurations": [
{
"Classification": "spark-defaults",
"Configurations": [],
"Properties": {
"spark.driver.maxResultSize": "4G",
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,org.apache.spark.sql.hudi.HoodieSparkSessionExtension",
"spark.sql.catalogImplementation": "hive",
"spark.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"spark.hadoop.hive.metastore.glue.catalogid": "your_account_id",
"spark.hive.metastore.glue.catalogid": "your_account_id",
"spark.sql.legacy.parquet.datetimeRebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInRead": "CORRECTED",
"spark.sql.legacy.parquet.int96RebaseModeInWrite": "CORRECTED"
}
},
{
"Classification": "hive-site",
"Configurations": [],
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"hive.metastore.glue.catalogid": "your_account_id"
}
},
{
"Classification": "spark-hive-site",
"Configurations": [],
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"hive.metastore.glue.catalogid": "your_account_id"
}
},
{
"Classification": "yarn-env",
"Configurations": [
{
"Classification": "export",
"Configurations": [],
"Properties": {
"TEST_ENV_VAR": "TEST_VALUE"
}
}
],
"Properties": {}
}
],
"BootstrapActions": [
{
"Name": "install_jars_from_s3",
"ScriptBootstrapAction": {
"Path": "s3://tecton-cluster/path/install_jars_from_s3.sh",
"Args": [
"s3://tecton.ai.public/jars/spark-snowflake_2.12-2.9.1-spark_3.0.jar",
"s3://tecton.ai.public/jars/snowflake-jdbc-3.13.6.jar"
]
}
},
],
}