Skip to main content
Version: 0.9

Configure EMR

The following steps configure the connection between a Tecton Account and the Customer Data Plane AWS and EMR resources. See the Virtual Private Tecton Architecture overview for more information on the Customer Control Plane.

"Terraform Templates for AWS Account Configuration"

If your organization uses Terraform to manage AWS resources, we recommend you leverage this sample terraform setup repository in place of manually entering these values. The instructions below may still be a valuable reference when adapting the template to your needs, especially the networking section. Once you've applied the configuration to your account, please see the request your installation step.

If you have already connected your Tecton Account to your Cloud Provider, then you only need to configure the Spark and EMR roles below.

Create a Tecton S3 Bucket​

Tecton will use a single S3 bucket to store all of your offline materialized feature data.

To configure the S3 bucket:

  1. Create an S3 bucket called tecton-[DEPLOYMENT_NAME] (e.g. tecton-mycompany-production).

  2. Ensure the bucket's region is the same as the region in which you'd like to deploy Tecton (e.g. us-west-2).

  3. Enable default encryption using the Amazon S3 key (SSE-S3).

  4. (Optional step if you want to enable Rift) Add a Policy to the S3 bucket tecton-{DEPLOYMENT_NAME} to allow Tecton to read/write to it.

    1. Navigate to S3 -> tecton-{DEPLOYMENT_NAME} S3 Bucket -> Permissions. There, add the following policy, replacing {TECTON_CONTROL_PLANE_ARN} with the ARN of the Tecton Control Plane account. Ask your Tecton Account Manager if you do not have this ARN.

      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Sid": "TectonS3",
      "Effect": "Allow",
      "Principal": {
      "AWS": "{TECTON_CONTROL_PLANE_ARN}"
      },
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::tecton-{DEPLOYMENT_NAME}/*"
      },
      {
      "Sid": "TectonS3List",
      "Effect": "Allow",
      "Principal": {
      "AWS": "{TECTON_CONTROL_PLANE_ARN}"
      },
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::tecton-{DEPLOYMENT_NAME}"
      }
      ]
      }
    2. Make sure the Object Ownership Setting for the tecton-{DEPLOYMENT_NAME} S3 Bucket is set to ACLs Disabled (Bucket Owner Enforced). This allows the AWS account with the S3 bucket to automatically own and have full control over objects written by Rift.

Configure IAM roles​

In this section we'll configure the roles and policies required for Tecton to manage S3, Dynamo, and Spark resources. After completing this section, you should have:

  1. A Spark role (tecton-{DEPLOYMENT_NAME}-spark-role) with the following policies

    • tecton-{DEPLOYMENT_NAME}-spark-policy
    • tecton-spark-scoped-secrets-policy
    • AmazonSSMManagedInstanceCore policy
  2. An EMR Manager(tecton-{DEPLOYMENT_NAME}-emr-manager-role) role with the following policies

    • tecton-{DEPLOYMENT_NAME}-spark-policy
    • tecton-emr-manager-policy
  3. A cross-account role (tecton-{DEPLOYMENT_NAME}-cross-account-role) with the following policies

    • tecton-cross-account-spark-policy
    • tecton-{DEPLOYMENT_NAME}-cross-account-policy

Configure the EMR Manager and Spark Roles​

  1. In the AWS Console of the account you want to deploy Tecton into, go to the IAM service.

  2. Click the Policies tab in the sidebar.

  3. Create the Tecton Spark Policy

    1. Click Create Policy.

    2. Paste in the following JSON policy, replacing ${REGION} with the AWS region you selected for your deployment, ${ACCOUNT_ID} with the account ID of your Tecton Data Plane account, and ${DEPLOYMENT_NAME} with your Tecton deployment name

      https://github.com/tecton-ai/tecton-terraform-setup/blob/master/templates/spark_policy.json
    3. Click Next: Tags

    4. Click Next: Review

    5. Give the policy an easy to remember name starting with tecton-, like tecton-{DEPLOYMENT_NAME}-spark-policy

  4. Create the Tecton EMR Manager policy

    1. Click Create Policy.

    2. Paste in the following JSON policy, replacing ${SPARK_ROLE} with the name you plan to use for the role (such as tecton-{DEPLOYMENT_NAME}-spark-role), and ${DEPLOYMENT_NAME} with your Tecton deployment name

      https://github.com/tecton-ai/tecton-terraform-setup/blob/master/templates/emr_master_policy.json
    3. Click Next: Tags

    4. Click Next: Review

    5. Give the policy an easy to remember name starting with tecton-, like tecton-emr-manager-policy

    6. Click Create Policy

  5. Create the Tecton Spark Scoped Secrets policy

    1. Click Create Policy.

    2. Paste in the following JSON policy, replacing ${ACCOUNT_ID} with the account ID of your AWS account, and ${DEPLOYMENT_NAME} with your Tecton deployment name

      https://github.com/tecton-ai/tecton-terraform-setup/blob/master/templates/emr_spark_policy.json
    3. Click Next: Tags

    4. Click Next: Review

    5. Give the policy an easy to remember name starting with tecton-, like tecton-spark-scoped-secrets-policy

    6. Click Create Policy

  6. Click the Roles tab in the sidebar.

  7. Create the Spark Role

    1. Click Create role.

    2. Select EC2 under Common Use Cases

    3. Click the Next: Permissions button

    4. Attach the Tecton Spark Policy by searching for the policy you created earlier, such as tecton-spark-policy, and click the check box next to that policy to attach the policy to the new role.

    5. Attach the Tecton Spark Scoped Secrets Policy by searching for the policy you created earlier, such as tecton-spark-scoped-secrets-policy, and click the check box next to that policy to attach the policy to the new role.

    6. Attach the AmazonSSMManagedInstanceCore policy by searching for the AmazonSSMManagedInstanceCore policy, and click the check box next to the policy to attach the policy to the new role.

    7. Click the Next: Tags button.

    8. Click the Next: Review button.

    9. In the Role name field, enter a role name starting with tecton-, such as tecton-{DEPLOYMENT_NAME}-spark-role.

    10. Click Create role. You will see a list of roles displayed.

    11. Ensure that the role has an Instance Profile associated with it, and that the Instance Profile has the same name as the role. If you created this role through the console, the Instance Profile should have been created automatically.

    12. Ensure that the role has "AWS Service: ec2" in its "Trusted Entities". If you created this role through the console, this should have been added automatically.

  8. Create the EMR Manager role

    1. Click Create role.

    2. Select EMR under Use Cases

    3. At the bottom of the page, select the default EMR role.

    4. Click the Next: Permissions button

    5. Search for the Tecton Spark policy you created earlier, such as tecton-spark-policy, and click the check box next to that policy to attach the policy to the new role.

    6. Search for the Tecton EMR Manager policy you created earlier, such as tecton-emr-manager-policy, and click the check box next to that policy to attach the policy to the new role.

    7. Click the Next: Tags button.

    8. Click the Next: Review button.

    9. In the Role name field, enter a role name starting with tecton-, such as tecton-{DEPLOYMENT_NAME}-emr-manager-role.

    10. Click Create role. You will see a list of roles displayed.

    11. Ensure that the role has "AWS Service: elasticmapreduce" in its "Trusted Entities". If you created this role through the console, this should have been added automatically.

Configure the cross-account role for the Tecton Control Plane​

  1. In the AWS Console of the account you want to deploy Tecton into, go to the IAM service.

  2. Click the Policies tab in the sidebar.

  3. Create the cross-account Spark policy

    1. Click Create Policy.

    2. Paste in the following JSON policy, replacing ${SPARK_ROLE} with the same role name you used previously (such as tecton-{DEPLOYMENT_NAME}-spark-role), ${EMR_MANAGER_ROLE} with the name you plan to use for the role (such as tecton-{DEPLOYMENT_NAME}-emr-manager-role), ${REGION} with the AWS region you selected for your deployment, ${ACCOUNT_ID} with the account ID of your Tecton Data Plane account, and ${DEPLOYMENT_NAME} with your Tecton deployment name

      https://github.com/tecton-ai/tecton-terraform-setup/blob/master/templates/emr_ca_policy.json
    3. Click Next: Tags

    4. Click Next: Review

    5. Give the policy an easy to remember name, like tecton-cross-account-spark-policy

    6. Click Create Policy

  4. Create the cross-account policy

    1. Click Create Policy.

    2. Paste in the following JSON policy, replacing ${REGION} with the AWS region you selected for your deployment, ${ACCOUNT} with the account ID of your AWS account, ${DEPLOYMENT_NAME} with your Tecton deployment name, and ${SPARK_ROLE} with the name of your spark role, such as tecton-{DEPLOYMENT_NAME}-spark-role.

      https://github.com/tecton-ai/tecton-terraform-setup/blob/master/templates/ca_policy.json
    3. Click Next: Tags

    4. Click Next: Review

    5. Give the policy an easy to remember name starting with tecton-, like tecton-{DEPLOYMENT_NAME}-cross-account-policy

    6. Click Create Policy

  5. Create the cross-account role

    1. Click the Roles tab in the sidebar.

    2. Click Create role.

    3. Under Select type of trusted entity, click the Another AWS account tile.

    Create Role Image

    1. Specify the Tecton Account ID. Please contact your account executive obtain the correct account ID for you.

    2. Enable the option "Require external ID."

    3. Enter a random External ID of your choice (for example, a UUID works well). Make sure to note down the external ID that you choose -- you'll need to provide this to Tecton to complete the installation.

    4. Click the Next: Permissions button

    5. Search for the policy you created (e.g. tecton-{DEPLOYMENT_NAME}-cross-account-policy), and click the check box next to that policy to attach the policy to the new role.

    6. Search for the cross-account Spark policy you created (e.g. tecton-cross-account-spark-policy), and click the check box next to that policy to attach the policy to the new role.

    7. Click the Next: Tags button.

    8. Click the Next: Review button.

    9. In the Role name field, enter a role name starting with tecton-, such as tecton-{DEPLOYMENT_NAME}-cross-account-role.

    10. Click Create role. You will see a list of roles displayed.

Configure networking​

Tecton will need a VPC and subnets to use when creating EMR clusters -- these can be existing resources or you can create them for Tecton. Either way, make sure to tag the resources with the tecton-accessible:DEPLOYMENT_NAME tag.

Configure the VPC and subnet​

  1. Add the following tag to the VPC:
key: tecton-accessible:DEPLOYMENT_NAME
value: true
  1. You'll need a private subnet in each of the availability zones you intend for Tecton to use (at least 2 AZs)
    • Ensure the route table for each of the subnets allows internet access on 0.0.0.0/0. You can accomplish this using NAT Gateways.
  2. Add the follow tag to each subnet:
key: tecton-accessible:DEPLOYMENT_NAME
value: true

Configure security groups​

You'll need to set up two security groups that allow the EMR clusters that Tecton creates to:

  • Communicate internally
  • Connect to other AWS resources
  • Externally pull configuration
  • Install Python packages
  • Push metrics for monitoring and alerts

To do so, complete the following steps:

  1. Navigate to the "Security Groups" service in the AWS console

  2. Click "Create security group"

  3. Name the first security group tecton-emr-security-group, and give it a description (e.g. "A security group that EMR clusters created by Tecton will use to communicate internally")

  4. Ensure the VPC you selected in the previous step is selected here.

  5. Add the following tags to the security group:

    key: tecton-accessible:DEPLOYMENT_NAME
    value: true

    key: tecton-security-group-emr-usage
    value: manager,core&task
  6. Click "Create Security Group"

  7. Name the second security group tecton-service-emr-security-group, and give it a description (e.g. "A security group that EMR clusters created by Tecton will use to communicate with EMR services")

  8. Ensure the VPC you selected in the previous step is selected here.

  9. Add the following tags to the security group:

    key: tecton-accessible:DEPLOYMENT_NAME
    value: true

    key: tecton-security-group-emr-usage
    value: service-access
  10. Click "Create Security Group"

  11. Add the following inbound rules to tecton-emr-security-group

    1. Allow "All TCP" from tecton-emr-security-group
    2. Allow "Custom TCP" on port 8443 from tecton-service-emr-security-group
  12. Add the following outbound rules to tecton-emr-security-group

    1. Allow "All Traffic" to destination 0.0.0.0/0
  13. Add the following inbound rules to tecton-service-emr-security-group

    1. Allow "Custom TCP" on port 9443 from tecton-emr-security-group
  14. Add the following outbound rules to tecton-service-emr-security-group

    1. Allow "Custom TCP" on port 8443 to tecton-emr-security-group

Request your Tecton Installation​

Once you've completed the above setup, you're ready to request your installation! Send the following information to the Tecton team:

  • Your deployment name (e.g. mycompany-production)
  • The region in which you'd like Tecton deployed (e.g. us-west-2)
  • The ARN and External ID of the Tecton cross-account role (tecton-{DEPLOYMENT_NAME}-cross-account-role)
  • The ARN of the Spark role (tecton-{DEPLOYMENT_NAME}-spark-role) and the matching Instance Profile
  • The ARN of the EMR Manager role (tecton-{DEPLOYMENT_NAME}-emr-manager-role)

After you send this information to Tecton, the team will deploy Tecton into your account.

Configure access for data sources​

Tecton's Spark role may need to have access to your batch data sources. Follow Connecting Data Sources for data source specific configuration.

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon