Skip to main content
Version: 0.9

Configure Data Source Access per Workspace

Private Preview

This feature is currently in Private Preview.

This feature has the following limitations:
  • Available for Tecton on Databricks and EMR.
If you would like to participate in the preview, please file a support ticket.

This article shows how to configure Data Source access for a Workspace by limiting the AWS IAM identities that can be assumed by Spark clusters during feature materialization. These controls may be useful if you have a multi-Workspace strategy, where some data sources are sensitive and should only be available for use in a subset of Workspaces.

There are three steps to achieve this configuration:

  1. Set up the necessary instance profile(s) in AWS:
    • Each role must be added to its respective instance profile.
    • The instance profile for each role must have the same name as the role.
    • Refer to AWS documentation for more details on how to create instance profiles.
  2. Specify the allow-list of IAM identities (instance profiles or roles) that are available for use in a Workspace.
  3. Configure the specific instance profile or role used during materialization for a specific Feature View.

Specify the instance profile or role allow-list​

In order to update the instance profile allow-list for a live Workspace, open a ticket with Tecton Support and provide:

  1. The AWS IAM identities that should be included in the allow-list
    • For Databricks, you'll provide instance profile ARNs similar to arn:aws:iam::000000000000:instance-profile/your-tecton-spark-role
    • For EMR, you'll provide IAM roles similar to your-tecton-spark-role
  2. The Workspaces for which the allow-list should be configured

Tecton Support will need confirmation by a current Tecton Admin from your account before updating the allow-list.

When specifying your allow-lists, note that:

  • Once any allow-list has been configured for your account, all Live Workspaces will require allow-lists. If no allow-list is configured for a Live Workspace, then tecton plan/apply will fail.
  • If the Workspace has existing Feature Views using instance profiles or roles that are not on the allow-list, then new materialization jobs will fail at the next job attempt. Because stream jobs are long running, it may be some time before the current job is cancelled and the next attempt starts.
  • The instance profile or role specified during Tecton deployment must have access to all data sources in order to perform validation. You do not need to use this instance profile or role during materialization time.

For example, consider the following scenario:

  • You have Hive tables A, B, & C, and roles A', B', & C', which have permission to access their corresponding tables. Additionally, you have live Workspaces X, Y, & Z.
  • Hive table A is safe for use by any team. Table B can be used by Workspaces X & Y, but not Z. Table C can only be used by Workspace Y.

Then you should configure the allowlists to be:

  • Workspace X: roles A', B'
  • Workspace Y: roles A', B', C'
  • Workspace Z: role A'

Configuring the instance profile or role for a Feature View​

In order to configure the instance profile during batch or stream materialization, you must use the DatabricksJsonClusterConfig or EMRJsonClusterConfig interface. If the instance profile used by a Feature View is not on the allow-list defined above, then tecton plan and tecton apply will display an error message.

  • For DatabricksJsonClusterConfig, assign your instance profile ARN to the instance_profile_arn parameter within the new_cluster.aws_attributes object. See the example here.
  • For EMRJsonClusterConfig, assign your role name to the JobFlowrole parameter. See the example here.

If you do not specify DatabricksJsonClusterConfig or EMRJsonClusterConfig, then the default instance profile or role defined during Tecton deployment will be used for materializing the Feature View. If the allow-list specified for a Workspace does not include the default instance profile or role, then tecton plan or tecton apply will fail if any Feature View does not specify these cluster configuration options.

Was this page helpful?