Skip to main content
Version: 0.9

Dataset

Summary​

Dataset class.

Persisted data consisting of entity & request keys, timestamps, and calculated features. Datasets are associated with either a FeatureService or FeatureView.

There are 2 types of Datasets: Saved and Logged.

Saved Datasets are generated manually when calling get_features_for_events() by setting the save parameter.

Logged Datasets are generated automatically when declaring a FeatureService with LoggingConfig, and the data is continuously added to it when requesting online data from the FeatureService.

To get an existing Dataset, call workspace.get_dataset().

Attributes​

NameData TypeDescription
columnsList[str]The columns of the dataframe
is_archivedbool
nameDataset name

Methods​

NameDescription
explain(...)Prints the query tree.
get_spine_dataframe()Get a TectonDataFrame containing the spine.
get_time_range(...)
subtree(...)Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain().
summary()Print out a summary of this class’s attributes.
to_pandas()Converts the Dataset to a Pandas DataFrame and returns it.
to_snowpark()Returns data as a Snowpark DataFrame.
to_spark()Converts the Dataset to a Spark DataFrame and returns it.

__init__(...)​

Parameters​

  • proto,
  • spark_df
  • pandas_df

explain(...)​

Prints the query tree. Should only be used when this TectonDataFrame is backed by a query tree.

Parameters​

  • node_id (bool) – If True, the unique id associated with each node will be rendered. (Default: True)

  • name (bool) – If True, the class names of the nodes will be rendered. (Default: True)

  • description (bool) – If True, the actions of the nodes will be rendered. (Default: True)

  • columns (bool) – If True, the columns of each node will be rendered as an appendix after tree itself. (Default: False)

get_spine_dataframe()​

Get a TectonDataFrame containing the spine.

get_time_range(...)​

Parameters​

  • timestamp_key

subtree(...)​

Creates a TectonDataFrame from a subtree of prior querytree labeled by a node id in .explain().

Parameters​

  • node_id

summary()​

Print out a summary of this class’s attributes.

to_pandas()​

Converts the Dataset to a Pandas DataFrame and returns it.

to_snowpark()​

Returns data as a Snowpark DataFrame.

to_spark()​

Converts the Dataset to a Spark DataFrame and returns it.

Was this page helpful?