TectonDataFrame
Summary​
A thin wrapper around Pandas, Spark dataframes.Attributes​
Name | Data Type | Description |
---|---|---|
columns | Sequence[str] | The columns of the dataframe |
schema | Schema | The schema of the dataframe |
Methods​
Name | Description |
---|---|
__init__(...) | Method generated by attrs for class TectonDataFrame. |
explain(...) | Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree. |
get_sql_node(...) | Returns the first node from which SQL can be generated from the TectonDataFrame's query tree. |
start_dataset_job(...) | Start a job to materialize a dataset from this TectonDataFrame. |
subtree(...) | Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain(). |
to_arrow() | Get the results as arrow Table |
to_pandas(...) | Convert TectonDataFrame to Pandas DataFrame |
to_spark() | Returns data as a Spark DataFrame. |
__init__(...)​
Method generated by attrs for class TectonDataFrame.Parameters
spark_df
(Optional
[pyspark.sql.dataframe.DataFrame
]) - Default:None
pandas_df
(Optional
[pandas.core.frame.DataFrame
]) - Default:None
need_rewrite_tree
(bool
) - Default:false
schema
(Optional
[tecton_core.schema.Schema
]) - Default:None
request_params
(Union
[tecton_core.query.retrieval_params.GetFeaturesInRangeParams
,tecton_core.query.retrieval_params.GetFeaturesForEventsParams
,NoneType
]) - Default:None
children_dfs
(Optional
[List
[TectonDataFrame
]]) - Default:None
Returns
None
explain(...)​
Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree.Parameters
node_id
(bool
) - If True, the unique id associated with each node will be rendered. Default:true
name
(bool
) - If True, the class names of the nodes will be rendered. Default:true
description
(bool
) - If True, the actions of the nodes will be rendered. Default:true
columns
(bool
) - If True, the columns of each node will be rendered as an appendix after tree itself. Default:false
get_sql_node(...)​
Returns the first node from which SQL can be generated from the TectonDataFrame's query tree.Parameters
tree
(NodeRef
) - Subtree for which to generate SQL
start_dataset_job(...)​
Start a job to materialize a dataset from this TectonDataFrame.Parameters
dataset_name
(str
) - Dataset object will be created with this name. Dataset can be later retrieved by this name, hence it must be unique within the workspace.cluster_config
(Union
[_DefaultClusterConfig
,DatabricksClusterConfig
,EMRClusterConfig
,DatabricksJsonClusterConfig
,DataprocJsonClusterConfig
,EMRJsonClusterConfig
,RiftBatchConfig
,NoneType
]) - Configuration for Spark/Rift cluster Default:None
tecton_materialization_runtime
(Optional
[str
]) - Version oftecton
package used by the job cluster Default:None
environment
(Optional
[str
]) - The custom environment in which jobs will be run Default:None
extra_config
(Optional
[Dict
[str
,Any
]]) - Additional parameters (the list may vary depending on the tecton runtime) which may be used to tune remote execution heuristics (ie, what number to use when chunking the events dataframe) Default:None
compute_mode
(Union
[tecton_core.compute_mode.ComputeMode
,str
,NoneType
]) - Override compute mode used inget_features
call Default:None
Returns
DatasetJob
: DatasetJob object
subtree(...)​
Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain().Parameters
node_id
(int
) - identifier of node from .explain()
Returns
TectonDataFrame
to_pandas(...)​
Convert TectonDataFrame to Pandas DataFrameParameters
pretty_sql
(bool
) - Not applicable when using spark. For Snowflake and Athena, to_pandas() will generate a SQL string, execute it, and then return the resulting data in a pandas DataFrame. If True, the sql will be reformatted and executed as a more readable, multiline string. If False, the SQL will be executed as a one line string. Use pretty_sql=False for better performance. Default:false
Returns
DataFrame
: A Pandas DataFrame.
to_spark(...)​
Returns data as a Spark DataFrame.Returns
DataFrame
: A Spark DataFrame.