BatchSource
Summaryโ
A Tecton BatchSource, used to read batch data into Tecton for use in a BatchFeatureView.
Attributesโ
| Name | Data Type | Description |
|---|---|---|
created_at | Optional[datetime.datetime] | Returns the time that this Tecton object was created or last updated. None for locally defined objects. |
data_delay | Optional[datetime.timedelta] | Returns the duration that materialization jobs wait after the batch_schedule before starting, typically to ensure that all data has landed. |
data_source_type | data_source_type_pb2.DataSourceType.ValueType | |
defined_in | Optional[str] | The repo filename where this object was declared. None for locally defined objects. |
description | Optional[str] | Returns the description of the Tecton object. |
id | str | Returns the unique id of the Tecton object. |
info | ||
name | str | Returns the name of the Tecton object. |
owner | Optional[str] | Returns the owner of the Tecton object. |
prevent_destroy | bool | Return whether entity has prevent_destroy flagged |
tags | Dict[str, str] | Returns the tags of the Tecton object. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. None for locally defined objects. |
Methodsโ
| Name | Description |
|---|---|
__init__(...) | Creates a new BatchSource. |
get_dataframe(...) | Returns the data in this Data Source as a Tecton DataFrame. |
select_range(...) | Returns this DataSource object wrapped as a FilteredSource. FilteredSources will automatically pre-filter sources in Feature View definitions and can reduce compute costs. |
summary() | Displays a human-readable summary. |
unfiltered() | Return an unfiltered DataSource. This scope will make an entire source available to a Feature View, but can increase compute costs as a result. |
validate() | [Deprecated in SDK 1.0] Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary. |
__init__(...)โ
Creates a new BatchSource.Parameters
name: strA unique name of the DataSource.description: Optional[str] = NoneA human-readable description.tags: Optional[Dict[str, str]] = NoneTags associated with this Tecton Data Source (key-value pairs of arbitrary metadata).owner: Optional[str] = NoneOwner name (typically the email of the primary maintainer).prevent_destroy: bool = FalseIf True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object,prevent_destroymust be set to False via the same tecton apply or a separate tecton apply.prevent_destroycan be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs.prevent_destroyalso blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. ifprevent_destroyis set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service.prevent_destroyis only enforced in live (i.e. non-dev) workspaces.batch_config: BatchConfigTypeBatchConfig object containing the configuration of the Batch Data Source to be included in this Data Source.options: Optional[Dict[str, str]] = NoneAdditional options to configure the Source. Used for advanced use cases and beta features.
Exampleโ
# Declare a BatchSource with a HiveConfig instance as its batch_config parameter.
# Refer to the "Configs Classes and Helpers" section for other batch_config types.
from tecton import HiveConfig, BatchSource
credit_scores_batch = BatchSource(
name="credit_scores_batch",
batch_config=HiveConfig(database="demo_fraud", table="credit_scores", timestamp_field="timestamp"),
)
get_dataframe(...)โ
Returns the data in this Data Source as a Tecton DataFrame.Parameters
start_time: Optional[datetime.datetime] = NoneThe interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translatoris True.end_time: Optional[datetime.datetime] = NoneThe interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translatoris True.apply_translator: bool = TrueIf True, the transformation specified bypost_processorwill be applied to the dataframe for the data source.apply_translatoris not applicable to batch sources configured withspark_batch_configbecause it does not have apost_processor.compute_mode: Optional[Union[ComputeMode, str]] = NoneCompute mode to use to produce the data frame.
Returns
TectonDataFrame: A Tecton DataFrame containing the data source's raw or translated source data.Raises
- TectonValidationError: If
apply_translatoris False, butstart_timeorend_timefilters are passed in.
select_range(...)โ
Returns this DataSource object wrapped as a FilteredSource. FilteredSources will automatically pre-filter sources in Feature View definitions and can reduce compute costs.ย
Calling select_range() with no arguments returns an unfiltered source and is equivalent to source.unfiltered().
Parameters
start_time: Union[datetime.datetime, FilterDateTime, TectonTimeConstant]The start time of the filter. Can be a datetime or TectonTimeConstant optionally offset by a timedelta.end_time: Union[datetime.datetime, FilterDateTime, TectonTimeConstant]The end time of the filter. Can be a datetime or TectonTimeConstant optionally offset by a timedelta.
Returns
FilteredSource: A FilteredSource object that can be passed into a Feature View.Examples
# The following example demonstrates how to use select_range to filter a source to only include data from# 7 days before MATERIALIZATION_END_TIME.@batch_feature_view(...sources=[transactions_source.select_range(start_time=TectonTimeConstant.MATERIALIZATION_END_TIME - timedelta(days=7),end_time=TectonTimeConstant.MATERIALIZATION_END_TIME)]...)
# The following example filters all source data from 2020/1/1@batch_feature_view(...sources=[transactions_source.select_range(start_time=datetime.datetime(2020, 1, 1),end_time=TectonTimeConstant.UNBOUNDED_FUTURE)]...)
summary()โ
Displays a human-readable summary.unfiltered()โ
Return an unfiltered DataSource. This scope will make an entire source available to a Feature View, but can increase compute costs as a result.Returns
FilteredSourcevalidate()โ
Deprecation Warning
Deprecated in SDK 1.0. As of Tecton version 1.0 objects are validated upon object creation, so
validate() is unnecessary.