TectonDataFrame
Summaryโ
A thin wrapper around Pandas, Spark dataframes.Attributesโ
| Name | Data Type | Description |
|---|---|---|
columns | Sequence[str] | The columns of the dataframe |
schema | Schema | The schema of the dataframe |
Methodsโ
| Name | Description |
|---|---|
__init__(...) | Method generated by attrs for class TectonDataFrame. |
explain(...) | Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree. |
get_sql_node(...) | Returns the first node from which SQL can be generated from the TectonDataFrame's query tree. |
start_dataset_job(...) | Start a job to materialize a dataset from this TectonDataFrame. |
subtree(...) | Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain(). |
to_arrow() | Get the results as arrow Table |
to_pandas(...) | Convert TectonDataFrame to Pandas DataFrame |
to_spark() | Returns data as a Spark DataFrame. |
__init__(...)โ
Method generated by attrs for class TectonDataFrame.Parameters
spark_df: Optional[pyspark.sql.dataframe.DataFrame] = Nonepandas_df: Optional[pandas.core.frame.DataFrame] = Nonearrow_table: Optional[pyarrow.lib.Table] = Noneneed_rewrite_tree: bool = Falseschema: Optional[tecton_core.schema.Schema] = Nonerequest_params: Union[tecton_core.query.retrieval_params.GetFeaturesInRangeParams, tecton_core.query.retrieval_params.GetFeaturesForEventsParams, NoneType] = Nonechildren_dfs: Optional[List[TectonDataFrame]] = None
Returns
Noneexplain(...)โ
Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree.Parameters
node_id: bool = TrueIf True, the unique id associated with each node will be rendered.name: bool = TrueIf True, the class names of the nodes will be rendered.description: bool = TrueIf True, the actions of the nodes will be rendered.columns: bool = FalseIf True, the columns of each node will be rendered as an appendix after tree itself.
get_sql_node(...)โ
Returns the first node from which SQL can be generated from the TectonDataFrame's query tree.Parameters
tree: NodeRefSubtree for which to generate SQL
start_dataset_job(...)โ
Start a job to materialize a dataset from this TectonDataFrame.Parameters
dataset_name: strDataset object will be created with this name. Dataset can be later retrieved by this name, hence it must be unique within the workspace.cluster_config: Union[_DefaultClusterConfig, DatabricksClusterConfig, EMRClusterConfig, DatabricksJsonClusterConfig, EMRJsonClusterConfig, RiftBatchConfig, NoneType] = NoneConfiguration for Spark/Rift clustertecton_materialization_runtime: Optional[str] = NoneVersion oftectonpackage used by the job clusterenvironment: Optional[str] = NoneThe custom environment in which jobs will be runextra_config: Optional[Dict[str,Any]] = NoneAdditional parameters (the list may vary depending on the tecton runtime) which may be used to tune remote execution heuristics (ie, what number to use when chunking the events dataframe)compute_mode: Union[tecton_core.compute_mode.ComputeMode, str, NoneType] = NoneOverride compute mode used inget_featurescalljob_retry_times: Optional[int] = NoneMax retry times of the job. If not specified, use the default Remote Dateset Job retry times.
Returns
DatasetJob: DatasetJob objectsubtree(...)โ
Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain().Parameters
node_id: intidentifier of node from .explain()
Returns
TectonDataFrameto_pandas(...)โ
Convert TectonDataFrame to Pandas DataFrameParameters
pretty_sql: bool = FalseNot applicable when using spark. For Snowflake and Athena, to_pandas() will generate a SQL string, execute it, and then return the resulting data in a pandas DataFrame. If True, the sql will be reformatted and executed as a more readable, multiline string. If False, the SQL will be executed as a one line string. Use pretty_sql=False for better performance.