Skip to content

Kinesis Streams

Overview

This example explains how you can connect Tecton to a Kinesis streaming data source.

Kinesis is an AWS global resource, which is not associated with a specific virtual private cloud (VPC). This eliminates the need to set up networking access (Security Groups, Subnets, VPC Peering, and so on). Instead, just ensure that Tecton has AWS IAM permissions to read from the Kinesis data source. Tecton's default configuration enables it to access all Kinesis streams available in the AWS account in which Tecton is deployed.

Note

See Cross-Account Kinesis Access, below, for instructions on how to connect Tecton to a Kinesis stream in an AWS account that differs from the AWS account in which Tecton is deployed.

Once Tecton has access to your Kinesis streams, you need the following information:

  • region: The AWS region in which the Kinesis stream lives (for example: us-west-1, us-east-2)
  • stream_name: The unique name of the Kinesis stream

Sample Kinesis Data Source Configuration

Following is an example of using a Kinesis Data Source. First create the Data Source, then create a Virtual Data Source (VDS) that uses the data source.

ad_impressions_kinesis = KinesisDSConfig(
    **stream_name='ad-impressions',**
    **region='us-west-2',**
    timestamp_key='timestamp',
    default_watermark_delay_threshold="1minutes",
    default_initial_stream_position="trim_horizon",
)

ad_impressions_stream = VirtualDataSource(name="ad_impressions_stream",
    batch_ds_config=...,
    stream_ds_config=ad_impressions_kinesis
)

Cross-Account Kinesis Access

You might need access to a Kinesis stream that's in a different AWS account than Tecton's data plane. (See Deployment Types for more information about cross-account access.) To enable cross-account access:

  1. Create a cross-account role in the AWS of your Kinesis stream that allows Tecton-orchestrated Spark workers to read from your Kinesis stream
  2. Configure your KinesisDSConfig object to use the cross-account role by setting the roleArn parameter to the AWS ARN of the cross-account IAM role

Creating a Cross-Account Role

  1. In your Kinesis AWS Account, go to the IAM service and click the Roles tab.
  2. Click Create role. In the Select type of trusted entity panel, click Another AWS Account. Paste in the Account ID of Tecton's data plane AWS account, <deployment-acct-id>. You can get this ID by emailing support@tecton.ai.
  3. Click Next: permissions and give this role permission to access Kinesis. You can provide your own JSON or use the AmazonKinesisFullAccess policy.
  4. Click Next: Review and give the role a name, for example KinesisCrossAccountRole.
  5. Click Create role. The list of roles displays.
  6. In the Roles list, click KinesisCrossAccountRole and verify that the trusted account contains a JSON policy like:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<deployment-acct-id>:root"
        ],
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
  1. Copy the role ARN. For example: arn:aws:iam::<kinesis-owner-acct-id>:role/KinesisCrossAccountRole.

Configuring the KinesisDSConfig Object

Set the roleArn argument of your KinesisDSConfig as shown below:

ad_impressions_kinesis = KinesisDSConfig(
    stream_name='ad-impressions',
    region='us-west-2',
    timestamp_key='timestamp',
    default_watermark_delay_threshold="1minutes",
    default_initial_stream_position="trim_horizon",
        **options={'roleArn': '**arn:aws:iam::<kinesis-owner-acct-id>:role/KinesisCrossAccountRole**'}**
)

ad_impressions_stream = VirtualDataSource(name="ad_impressions_stream",
    batch_ds_config=...,
    stream_ds_config=ad_impressions_kinesis
)

Validate Data Access

To validate that Tecton can properly access the Kinesis stream, test the stream with the VirtualDataSource's start_stream_preview function, documented here. Use the function in an interactive notebook.