Skip to main content
Version: 1.0

BatchSource

Summary​

A Tecton BatchSource, used to read batch data into Tecton for use in a BatchFeatureView.

Attributes​

NameData TypeDescription
created_atOptional[datetime.datetime]Returns the time that this Tecton object was created or last updated. None for locally defined objects.
data_delayOptional[datetime.timedelta]Returns the duration that materialization jobs wait after the batch_schedule before starting, typically to ensure that all data has landed.
data_source_typedata_source_type_pb2.DataSourceType.ValueType
defined_inOptional[str]The repo filename where this object was declared. None for locally defined objects.
descriptionOptional[str]Returns the description of the Tecton object.
idstrReturns the unique id of the Tecton object.
info
namestrReturns the name of the Tecton object.
ownerOptional[str]Returns the owner of the Tecton object.
prevent_destroyboolReturn whether entity has prevent_destroy flagged
tagsDict[str, str]Returns the tags of the Tecton object.
workspaceOptional[str]Returns the workspace that this Tecton object belongs to. None for locally defined objects.

Methods​

NameDescription
__init__(...)Creates a new BatchSource.
get_dataframe(...)Returns the data in this Data Source as a Tecton DataFrame.
select_range(...)Returns this DataSource object wrapped as a FilteredSource. FilteredSources will automatically pre-filter sources in Feature View definitions and can reduce compute costs.
summary()Displays a human-readable summary.
unfiltered()Return an unfiltered DataSource. This scope will make an entire source available to a Feature View, but can increase compute costs as a result.
validate()Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.

__init__(...)​

Creates a new BatchSource.

Parameters

  • name (str) - A unique name of the DataSource.

  • description (Optional[str]) - A human-readable description. Default: None

  • tags (Optional[Dict[str, str]]) - Tags associated with this Tecton Data Source (key-value pairs of arbitrary metadata). Default: None

  • owner (Optional[str]) - Owner name (typically the email of the primary maintainer). Default: None

  • prevent_destroy (bool) - If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces. Default: false

  • batch_config (BatchConfigType) - BatchConfig object containing the configuration of the Batch Data Source to be included in this Data Source.

  • options (Optional[Dict[str, str]]) - Additional options to configure the Source. Used for advanced use cases and beta features. Default: None

Example​

# Declare a BatchSource with a HiveConfig instance as its batch_config parameter.
# Refer to the "Configs Classes and Helpers" section for other batch_config types.
from tecton import HiveConfig, BatchSource

credit_scores_batch = BatchSource(
name="credit_scores_batch",
batch_config=HiveConfig(database="demo_fraud", table="credit_scores", timestamp_field="timestamp"),
)

get_dataframe(...)​

Returns the data in this Data Source as a Tecton DataFrame.

Parameters

  • start_time (Optional[datetime.datetime]) - The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True. Default: None

  • end_time (Optional[datetime.datetime]) - The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True. Default: None

  • apply_translator (bool) - If True, the transformation specified by post_processor will be applied to the dataframe for the data source. apply_translator is not applicable to batch sources configured with spark_batch_config because it does not have a post_processor. Default: true

  • compute_mode (Optional[Union[ComputeMode, str]]) - Compute mode to use to produce the data frame. Default: None

Returns

data_frame.TectonDataFrame: A Tecton DataFrame containing the data source's raw or translated source data.

Raises

  • TectonValidationError: If apply_translator is False, but start_time or end_time filters are passed in.

summary()​

Displays a human-readable summary.

(Deprecated) validate()​

Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.

Returns

None

Was this page helpful?