tecton.interactive.StreamDataSource

class tecton.interactive.StreamDataSource

StreamDataSource is an abstraction data over streaming data sources.

StreamFeatureViews and StreamWindowAggregateFeatureViews ingest data from StreamDataSources.

A StreamDataSource contains a stream data source config, as well as a batch data source config for backfills.

Methods

dataframe

Deprecated.

get_dataframe

Returns this data source’s data as a Tecton DataFrame.

preview

Deprecated.

start_stream_preview

Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.

summary

Displays a human readable summary of this data source.

dataframe()

Deprecated. Returns this data source’s data as a Spark DataFrame.

Returns

A Spark DataFrame containing the data source’s data.

get_dataframe(start_time=None, end_time=None, *, apply_translator=True)

Returns this data source’s data as a Tecton DataFrame.

Parameters
  • start_time (Union[DateTime, datetime, None]) – (Optional) The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.

  • end_time (Union[DateTime, datetime, None]) – (Optional) The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.

  • apply_translator (bool) – If True, the transformation specified by raw_batch_translator will be applied to the dataframe for the data source.

Returns

A Tecton DataFrame containing the data source’s raw or translated source data.

Raises

TectonValidationError – If apply_translator is False, but start_time or end_time filters are passed in.

preview(limit=10)

Deprecated. Shows a preview of the data source’s data from its batch data source.

Parameters

limit (int) – (default=10) The number of rows to preview.

Returns

A pandas DataFrame containing a preview of data.

start_stream_preview(table_name, *, apply_translator=True, option_overrides=None)

Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.

After records have been written to the table, they can be queried using spark.sql(). If ran in a Databricks notebook, Databricks will also automatically visualize the number of incoming records.

This is a testing method, most commonly used to verify a StreamDataSource is correctly receiving streaming events. Note that the table will grow infinitely large, so this is only really useful for debugging in notebooks.

Parameters
  • table_name (str) – The name of the temporary table that this method will write to.

  • apply_translator (bool) – Whether to apply this data source’s raw_stream_translator. When True, the translated data will be written to the table. When False, the raw, untranslated data will be written.

  • option_overrides (Optional[Dict[str, str]]) – A dictionary of Spark readStream options that will override any readStream options set by the data source. Can be used to configure behavior only for the preview, e.g. setting startingOffsets:latest to preview only the most recent events in a Kafka stream.

summary()

Displays a human readable summary of this data source.

Attributes

columns

Returns streaming DS columns if it’s present.

created_at

Returns the creation date of this Tecton Object.

defined_in

Returns filename where this Tecton Object has been declared.

description

The description of this Tecton Object, set by user.

family

The family of this Tecton Object, used to group Objects.

id

Returns a unique ID for the data source.

is_streaming

Whether or not it’s a StreamDataSource.

name

The name of this Tecton Object.

owner

The owner of this Tecton Object (typically the email of the primary maintainer.)

tags

Tags associated with this Tecton Object (key-value pairs of arbitrary metadata set by user.)

workspace

Returns the workspace this Tecton Object was created in.