Skip to main content
Version: 1.0

TectonDataFrame

Summary​

A thin wrapper around Pandas, Spark, and Snowflake dataframes.

Attributes​

NameData TypeDescription
columnsSequence[str]The columns of the dataframe
schemaSchemaThe schema of the dataframe

Methods​

NameDescription
__init__(...)Method generated by attrs for class TectonDataFrame.
explain(...)Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree.
get_sql_node(...)Returns the first node from which SQL can be generated from the TectonDataFrame's query tree.
start_dataset_job(...)Start a job to materialize a dataset from this TectonDataFrame.
subtree(...)Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain().
to_pandas(...)Convert TectonDataFrame to Pandas DataFrame
to_snowpark(...)Returns data as a Snowpark DataFrame.
to_spark()Returns data as a Spark DataFrame.

__init__(...)​

Method generated by attrs for class TectonDataFrame.

Parameters

Returns

None

explain(...)​

Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree.

Parameters

  • node_id (bool) - If True, the unique id associated with each node will be rendered. Default: true

  • name (bool) - If True, the class names of the nodes will be rendered. Default: true

  • description (bool) - If True, the actions of the nodes will be rendered. Default: true

  • columns (bool) - If True, the columns of each node will be rendered as an appendix after tree itself. Default: false

get_sql_node(...)​

Returns the first node from which SQL can be generated from the TectonDataFrame's query tree.

Parameters

  • tree (NodeRef) - Subtree for which to generate SQL

start_dataset_job(...)​

Start a job to materialize a dataset from this TectonDataFrame.

Parameters

  • dataset_name (str) - Dataset object will be created with this name. Dataset can be later retrieved by this name, hence it must be unique within the workspace.

  • cluster_config (Union[_DefaultClusterConfig, DatabricksClusterConfig, EMRClusterConfig, DatabricksJsonClusterConfig, DataprocJsonClusterConfig, EMRJsonClusterConfig, RiftBatchConfig, NoneType]) - Configuration for Spark/Rift cluster Default: None

  • tecton_materialization_runtime (Optional[str]) - Version of tecton package used by the job cluster Default: None

  • environment (Optional[str]) - The custom environment in which jobs will be run Default: None

  • extra_config (Optional[Dict[str,Any]]) - Additional parameters (the list may vary depending on the tecton runtime) which may be used to tune remote execution heuristics (ie, what number to use when chunking the events dataframe) Default: None

  • compute_mode (Union[tecton_core.compute_mode.ComputeMode, str, NoneType]) - Override compute mode used in get_features call Default: None

Returns

DatasetJob: DatasetJob object

subtree(...)​

Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain().

Parameters

  • node_id (int) - identifier of node from .explain()

Returns

TectonDataFrame

to_pandas(...)​

Convert TectonDataFrame to Pandas DataFrame

Parameters

  • pretty_sql (bool) - Not applicable when using spark. For Snowflake and Athena, to_pandas() will generate a SQL string, execute it, and then return the resulting data in a pandas DataFrame. If True, the sql will be reformatted and executed as a more readable, multiline string. If False, the SQL will be executed as a one line string. Use pretty_sql=False for better performance. Default: false

Returns

DataFrame: A Pandas DataFrame.

to_snowpark(...)​

Returns data as a Snowpark DataFrame.

Parameters

  • pretty_sql (bool) - to_snowpark() will generate a SQL string, execute it, and then return the resulting data in a snowpark DataFrame. If True, the sql will be reformatted and executed as a more readable, multiline string. If False, the SQL will be executed as a one line string. Use pretty_sql=False for better performance. Default: false

Returns

Any: A Snowpark DataFrame.

to_spark(...)​

Returns data as a Spark DataFrame.

Returns

DataFrame: A Spark DataFrame.

Was this page helpful?