tecton.KinesisDSConfig

class tecton.KinesisDSConfig(stream_name, region, raw_stream_translator, timestamp_key, default_initial_stream_position, default_watermark_delay_threshold, deduplication_columns=None, options=None)

Configuration used to reference a Kinesis stream.

The KinesisDSConfig class is used to create a reference to an AWS Kinesis stream.

This class used as an input to a VirtualDataSource’s parameter batch_config. This class is not a Tecton Primitive: it is a grouping of parameters. Declaring this class alone will not register a data source. Instead, declare a VirtualDataSource that takes this configuration class as an input.

Methods

__init__

Instantiates a new KinesisDSConfig.

__init__(stream_name, region, raw_stream_translator, timestamp_key, default_initial_stream_position, default_watermark_delay_threshold, deduplication_columns=None, options=None)

Instantiates a new KinesisDSConfig.

Parameters
  • stream_name (str) – Name of the Kinesis stream.

  • region (str) – AWS region of the stream, e.g: “aws-west”.

  • raw_stream_translator – Python user defined function f(DF) -> DF that takes in raw Pyspark data source DataFrame and translates it to the DF to be consumed by the Feature Package. See an example of raw_stream_translator in the User Guide.

  • timestamp_key (str) – Name of the column containing timestamp for watermarking.

  • default_initial_stream_position (str) – Initial position in stream, e.g: “latest” or “trim_horizon”. More information available in AWS Kinesis Documentation.

  • default_watermark_delay_threshold (str) – Watermark time interval, e.g: “30 minutes”

  • deduplication_columns (Optional[List[str]]) – (Optional) Columns the stream data that uniquely identify data records. Used for de-duplicating.

  • options – (Optional) A map of additional Spark readStream options

Returns

A KinesisDSConfig class instance.