tecton.interactive.StreamDataSource

class tecton.interactive.StreamDataSource

StreamDataSource is an abstraction data over streaming data sources.

StreamFeatureViews and StreamWindowAggregateFeatureViews ingest data from StreamDataSources.

A StreamDataSource contains a stream data source config, as well as a batch data source config for backfills.

Methods

get_dataframe

Returns this data source’s data as a Tecton DataFrame.

start_stream_preview

Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.

summary

Displays a human readable summary of this data source.

get_dataframe(start_time=None, end_time=None, *, apply_translator=True)

Returns this data source’s data as a Tecton DataFrame.

Parameters
  • start_time (Union[DateTime, datetime, None]) – The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.

  • end_time (Union[DateTime, datetime, None]) – The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.

  • apply_translator (bool) – If True, the transformation specified by raw_batch_translator will be applied to the dataframe for the data source. apply_translator is not applicable to batch sources configured with spark_batch_config because it does not have a post_processor.

Returns

A Tecton DataFrame containing the data source’s raw or translated source data.

Raises

TectonValidationError – If apply_translator is False, but start_time or end_time filters are passed in.

start_stream_preview(table_name, *, apply_translator=True, option_overrides=None)

Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.

After records have been written to the table, they can be queried using spark.sql(). If ran in a Databricks notebook, Databricks will also automatically visualize the number of incoming records.

This is a testing method, most commonly used to verify a StreamDataSource is correctly receiving streaming events. Note that the table will grow infinitely large, so this is only really useful for debugging in notebooks.

Parameters
  • table_name (str) – The name of the temporary table that this method will write to.

  • apply_translator (bool) – Whether to apply this data source’s raw_stream_translator. When True, the translated data will be written to the table. When False, the raw, untranslated data will be written. apply_translator is not applicable to stream sources configured with spark_stream_config because it does not have a post_processor.

  • option_overrides (Optional[Dict[str, str]]) – A dictionary of Spark readStream options that will override any readStream options set by the data source. Can be used to configure behavior only for the preview, e.g. setting startingOffsets:latest to preview only the most recent events in a Kafka stream.

summary()

Displays a human readable summary of this data source.

Attributes

columns

Returns streaming DS columns if it’s present.

created_at

Returns the creation date of this Tecton Object.

defined_in

Returns filename where this Tecton Object has been declared.

description

The description of this Tecton Object, set by user.

family

Deprecated.

id

Returns a unique ID for the data source.

is_streaming

Whether or not it’s a StreamDataSource.

name

The name of this Tecton Object.

owner

The owner of this Tecton Object (typically the email of the primary maintainer.)

tags

Tags associated with this Tecton Object (key-value pairs of arbitrary metadata set by user.)

workspace

Returns the workspace this Tecton Object was created in.