Version: Beta 🚧

TectonDataFrame

Summary

A thin wrapper around Pandas, Spark dataframes.

Attributes

Name	Data Type	Description
`columns`	`Sequence[str]`	The columns of the dataframe
`schema`	`Schema`	The schema of the dataframe

Methods

Name	Description
`__init__(...)`	Method generated by attrs for class TectonDataFrame.
`explain(...)`	Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree.
`get_sql_node(...)`	Returns the first node from which SQL can be generated from the TectonDataFrame's query tree.
`start_dataset_job(...)`	Start a job to materialize a dataset from this TectonDataFrame.
`subtree(...)`	Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain().
`to_arrow()`	Get the results as arrow Table
`to_pandas(...)`	Convert TectonDataFrame to Pandas DataFrame
`to_spark()`	Returns data as a Spark DataFrame.

init(...)

Method generated by attrs for class TectonDataFrame.

Parameters

spark_df (Optional[pyspark.sql.dataframe.DataFrame]) - Default: None

pandas_df (Optional[pandas.core.frame.DataFrame]) - Default: None

need_rewrite_tree (bool) - Default: false

schema (Optional[tecton_core.schema.Schema]) - Default: None

request_params (Union[tecton_core.query.retrieval_params.GetFeaturesInRangeParams, tecton_core.query.retrieval_params.GetFeaturesForEventsParams, NoneType]) - Default: None

children_dfs (Optional[List[TectonDataFrame]]) - Default: None

Returns

None

explain(...)

Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree.

Parameters

node_id (bool) - If True, the unique id associated with each node will be rendered. Default: true

name (bool) - If True, the class names of the nodes will be rendered. Default: true

description (bool) - If True, the actions of the nodes will be rendered. Default: true

columns (bool) - If True, the columns of each node will be rendered as an appendix after tree itself. Default: false

get_sql_node(...)

Returns the first node from which SQL can be generated from the TectonDataFrame's query tree.

Parameters

tree (NodeRef) - Subtree for which to generate SQL

start_dataset_job(...)

Start a job to materialize a dataset from this TectonDataFrame.

Parameters

dataset_name (str) - Dataset object will be created with this name. Dataset can be later retrieved by this name, hence it must be unique within the workspace.

cluster_config (Union[_DefaultClusterConfig, DatabricksClusterConfig, EMRClusterConfig, DatabricksJsonClusterConfig, EMRJsonClusterConfig, RiftBatchConfig, NoneType]) - Configuration for Spark/Rift cluster Default: None

tecton_materialization_runtime (Optional[str]) - Version of tecton package used by the job cluster Default: None

environment (Optional[str]) - The custom environment in which jobs will be run Default: None

extra_config (Optional[Dict[str,Any]]) - Additional parameters (the list may vary depending on the tecton runtime) which may be used to tune remote execution heuristics (ie, what number to use when chunking the events dataframe) Default: None

compute_mode (Union[tecton_core.compute_mode.ComputeMode, str, NoneType]) - Override compute mode used in get_features call Default: None

Returns

DatasetJob: DatasetJob object

subtree(...)

Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain().

Parameters

node_id (int) - identifier of node from .explain()

Returns

TectonDataFrame

to_pandas(...)

Convert TectonDataFrame to Pandas DataFrame

Parameters

pretty_sql (bool) - Not applicable when using spark. For Snowflake and Athena, to_pandas() will generate a SQL string, execute it, and then return the resulting data in a pandas DataFrame. If True, the sql will be reformatted and executed as a more readable, multiline string. If False, the SQL will be executed as a one line string. Use pretty_sql=False for better performance. Default: false

Returns

DataFrame: A Pandas DataFrame.

to_spark(...)

Returns data as a Spark DataFrame.

Returns

DataFrame: A Spark DataFrame.

Summary​

Attributes​

Methods​

__init__(...)​

Parameters

Returns

explain(...)​

Parameters

get_sql_node(...)​

Parameters

start_dataset_job(...)​

Parameters

Returns

subtree(...)​

Parameters

Returns

to_pandas(...)​

Parameters

Returns

to_spark(...)​

Returns

Was this page helpful?

Summary

Attributes

Methods

init(...)

explain(...)

get_sql_node(...)

start_dataset_job(...)

subtree(...)

to_pandas(...)

to_spark(...)