Skip to main content
Version: 1.1

BatchSource

Summaryโ€‹

A Tecton BatchSource, used to read batch data into Tecton for use in a BatchFeatureView.

Attributesโ€‹

NameData TypeDescription
created_atOptional[datetime.datetime]Returns the time that this Tecton object was created or last updated. None for locally defined objects.
data_delayOptional[datetime.timedelta]Returns the duration that materialization jobs wait after the batch_schedule before starting, typically to ensure that all data has landed.
data_source_typedata_source_type_pb2.DataSourceType.ValueType
defined_inOptional[str]The repo filename where this object was declared. None for locally defined objects.
descriptionOptional[str]Returns the description of the Tecton object.
idstrReturns the unique id of the Tecton object.
info
namestrReturns the name of the Tecton object.
ownerOptional[str]Returns the owner of the Tecton object.
prevent_destroyboolReturn whether entity has prevent_destroy flagged
tagsDict[str, str]Returns the tags of the Tecton object.
workspaceOptional[str]Returns the workspace that this Tecton object belongs to. None for locally defined objects.

Methodsโ€‹

NameDescription
__init__(...)Creates a new BatchSource.
get_dataframe(...)Returns the data in this Data Source as a Tecton DataFrame.
select_range(...)Returns this DataSource object wrapped as a FilteredSource. FilteredSources will automatically pre-filter sources in Feature View definitions and can reduce compute costs.
summary()Displays a human-readable summary.
unfiltered()Return an unfiltered DataSource. This scope will make an entire source available to a Feature View, but can increase compute costs as a result.
validate()[Deprecated in SDK 1.0] Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.

__init__(...)โ€‹

Creates a new BatchSource.

Parameters

  • name: str A unique name of the DataSource.
  • description: Optional[str] = None A human-readable description.
  • tags: Optional[Dict[str, str]] = None Tags associated with this Tecton Data Source (key-value pairs of arbitrary metadata).
  • owner: Optional[str] = None Owner name (typically the email of the primary maintainer).
  • prevent_destroy: bool = False If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces.
  • batch_config: BatchConfigType BatchConfig object containing the configuration of the Batch Data Source to be included in this Data Source.
  • options: Optional[Dict[str, str]] = None Additional options to configure the Source. Used for advanced use cases and beta features.

Exampleโ€‹

# Declare a BatchSource with a HiveConfig instance as its batch_config parameter.
# Refer to the "Configs Classes and Helpers" section for other batch_config types.
from tecton import HiveConfig, BatchSource

credit_scores_batch = BatchSource(
name="credit_scores_batch",
batch_config=HiveConfig(database="demo_fraud", table="credit_scores", timestamp_field="timestamp"),
)

get_dataframe(...)โ€‹

Returns the data in this Data Source as a Tecton DataFrame.

Parameters

  • start_time: Optional[datetime.datetime] = None The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.
  • end_time: Optional[datetime.datetime] = None The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.
  • apply_translator: bool = True If True, the transformation specified by post_processor will be applied to the dataframe for the data source. apply_translator is not applicable to batch sources configured with spark_batch_config because it does not have a post_processor.
  • compute_mode: Optional[Union[ComputeMode, str]] = None Compute mode to use to produce the data frame.

Returns

TectonDataFrame: A Tecton DataFrame containing the data source's raw or translated source data.

Raises

  • TectonValidationError: If apply_translator is False, but start_time or end_time filters are passed in.

select_range(...)โ€‹

Returns this DataSource object wrapped as a FilteredSource. FilteredSources will automatically pre-filter sources in Feature View definitions and can reduce compute costs.
ย 
Calling select_range() with no arguments returns an unfiltered source and is equivalent to source.unfiltered().

Parameters

  • start_time: Union[datetime.datetime, FilterDateTime, TectonTimeConstant] The start time of the filter. Can be a datetime or TectonTimeConstant optionally offset by a timedelta.
  • end_time: Union[datetime.datetime, FilterDateTime, TectonTimeConstant] The end time of the filter. Can be a datetime or TectonTimeConstant optionally offset by a timedelta.

Returns

FilteredSource: A FilteredSource object that can be passed into a Feature View.

Examples

# The following example demonstrates how to use select_range to filter a source to only include data from
# 7 days before MATERIALIZATION_END_TIME.
@batch_feature_view(
...
sources=[
transactions_source.select_range(
start_time=TectonTimeConstant.MATERIALIZATION_END_TIME - timedelta(days=7),
end_time=TectonTimeConstant.MATERIALIZATION_END_TIME
)
]
...
)
# The following example filters all source data from 2020/1/1
@batch_feature_view(
...
sources=[
transactions_source.select_range(
start_time=datetime.datetime(2020, 1, 1),
end_time=TectonTimeConstant.UNBOUNDED_FUTURE
)
]
...
)

summary()โ€‹

Displays a human-readable summary.

unfiltered()โ€‹

Return an unfiltered DataSource. This scope will make an entire source available to a Feature View, but can increase compute costs as a result.

Returns

FilteredSource

validate()โ€‹

Deprecation Warning
Deprecated in SDK 1.0. As of Tecton version 1.0 objects are validated upon object creation, so validate() is unnecessary.
Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.

Returns

None

Was this page helpful?