tecton.interactive.BatchDataSource

class tecton.interactive.BatchDataSource

BatchDataSource abstracts batch data sources.

BatchFeatureViews and BatchWindowAggregateFeatureViews ingest data from BatchDataSources.

Methods

dataframe

Returns this VirtualDataSource’s source data as a Spark DataFrame.

get_dataframe

Returns this VirtualDataSource’s data as a Tecton DataFrame.

preview

Shows a preview of the VirtualDataSource’s data from its batch data source.

start_stream_preview

Starts a streaming job to write incoming records from this VDS’s stream to a temporary table with a given name.

summary

Displays a human readable summary of this VirtualDataSource.

dataframe()

Returns this VirtualDataSource’s source data as a Spark DataFrame.

Returns

A Spark DataFrame containing the VirtualDataSource’s source data.

get_dataframe(start_time=None, end_time=None)

Returns this VirtualDataSource’s data as a Tecton DataFrame.

Parameters
  • start_time (Union[DateTime, datetime, None]) – (Optional) The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC.

  • end_time (Union[DateTime, datetime, None]) – (Optional) The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC.

Returns

A Tecton DataFrame containing the VirtualDataSource’s source data.

preview(limit=10)

Shows a preview of the VirtualDataSource’s data from its batch data source.

Parameters

limit (int) – (default=10) The number of rows to preview.

Returns

A pandas DataFrame containing a preview of data.

start_stream_preview(table_name)

Starts a streaming job to write incoming records from this VDS’s stream to a temporary table with a given name.

After records have been written to the table, they can be queried using spark.sql(). If ran in a Databricks notebook, Databricks will also automatically visualize the number of incoming records.

This is a testing method, most commonly used to verify a VirtualDataSource is correctly receiving streaming events. Note that the table will grow infinitely large, so this is only really useful for debugging in notebooks.

Parameters

table_name (str) – The name of the temporary table that this method will write to.

summary()

Displays a human readable summary of this VirtualDataSource.

Attributes

columns

Returns streaming DS columns if it’s present.

created_at

Returns the creation date of this Tecton Object.

defined_in

Returns filename where this Tecton Object has been declared.

description

The description of this Tecton Object, set by user.

family

The family of this Tecton Object, used to group Objects.

id

Returns an unique ID for the virtual data source.

is_streaming

Whether or not the VirtualDataSource contains a stream source.

name

The name of this Tecton Object.

owner

The owner of this Tecton Object (typically the email of the primary maintainer.)

tags

Tags associated with this Tecton Object (key-value pairs of arbitrary metadata set by user.)

workspace

Returns the workspace this Tecton Object was created in.