Version: Beta 🚧

BatchSource

Summary

A Tecton BatchSource, used to read batch data into Tecton for use in a BatchFeatureView.

Attributes

Name	Data Type	Description
`created_at`	`Optional[datetime.datetime]`	Returns the time that this Tecton object was created or last updated. `None` for locally defined objects.
`data_delay`	`Optional[datetime.timedelta]`	Returns the duration that materialization jobs wait after the `batch_schedule` before starting, typically to ensure that all data has landed.
`data_source_type`	`data_source_type_pb2.DataSourceType.ValueType`
`defined_in`	`Optional[str]`	The repo filename where this object was declared. `None` for locally defined objects.
`description`	`Optional[str]`	Returns the description of the Tecton object.
`id`	`str`	Returns the unique id of the Tecton object.
`info`
`name`	`str`	Returns the name of the Tecton object.
`owner`	`Optional[str]`	Returns the owner of the Tecton object.
`prevent_destroy`	`bool`	Return whether entity has prevent_destroy flagged
`tags`	`Dict[str, str]`	Returns the tags of the Tecton object.
`workspace`	`Optional[str]`	Returns the workspace that this Tecton object belongs to. `None` for locally defined objects.

Methods

Name	Description
`__init__(...)`	Creates a new BatchSource.
`get_dataframe(...)`	Returns the data in this Data Source as a Tecton DataFrame.
`select_range(...)`	Returns this DataSource object wrapped as a FilteredSource. FilteredSources will automatically pre-filter sources in Feature View definitions and can reduce compute costs.
`summary()`	Displays a human-readable summary.
`unfiltered()`	Return an unfiltered DataSource. This scope will make an entire source available to a Feature View, but can increase compute costs as a result.
`validate()`	[Deprecated in SDK 1.0] Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.

init(...)

Creates a new BatchSource.

Parameters

name: str A unique name of the DataSource.
description: Optional[str] = None A human-readable description.
tags: Optional[Dict[str, str]] = None Tags associated with this Tecton Data Source (key-value pairs of arbitrary metadata).
owner: Optional[str] = None Owner name (typically the email of the primary maintainer).
prevent_destroy: bool = False If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces.
batch_config: BatchConfigType BatchConfig object containing the configuration of the Batch Data Source to be included in this Data Source.
options: Optional[Dict[str, str]] = None Additional options to configure the Source. Used for advanced use cases and beta features.

Example

# Declare a BatchSource with a HiveConfig instance as its batch_config parameter.
# Refer to the "Configs Classes and Helpers" section for other batch_config types.
from tecton import HiveConfig, BatchSource

credit_scores_batch = BatchSource(
    name="credit_scores_batch",
    batch_config=HiveConfig(database="demo_fraud", table="credit_scores", timestamp_field="timestamp"),
)

get_dataframe(...)

Returns the data in this Data Source as a Tecton DataFrame.

Parameters

start_time: Optional[datetime.datetime] = None The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.
end_time: Optional[datetime.datetime] = None The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.
apply_translator: bool = True If True, the transformation specified by post_processor will be applied to the dataframe for the data source. apply_translator is not applicable to batch sources configured with spark_batch_config because it does not have a post_processor.
compute_mode: Optional[Union[ComputeMode, str]] = None Compute mode to use to produce the data frame.

Returns

TectonDataFrame: A Tecton DataFrame containing the data source's raw or translated source data.

Raises

TectonValidationError: If apply_translator is False, but start_time or end_time filters are passed in.

select_range(...)

Returns this DataSource object wrapped as a FilteredSource. FilteredSources will automatically pre-filter sources in Feature View definitions and can reduce compute costs.

Calling select_range() with no arguments returns an unfiltered source and is equivalent to source.unfiltered().

Parameters

start_time: Union[datetime.datetime, FilterDateTime, TectonTimeConstant] The start time of the filter. Can be a datetime or TectonTimeConstant optionally offset by a timedelta.
end_time: Union[datetime.datetime, FilterDateTime, TectonTimeConstant] The end time of the filter. Can be a datetime or TectonTimeConstant optionally offset by a timedelta.

Returns

FilteredSource: A FilteredSource object that can be passed into a Feature View.

Examples

#  The following example demonstrates how to use select_range to filter a source to only include data from
#  7 days before MATERIALIZATION_END_TIME.
@batch_feature_view(
    ...
    sources=[
        transactions_source.select_range(
            start_time=TectonTimeConstant.MATERIALIZATION_END_TIME - timedelta(days=7),
            end_time=TectonTimeConstant.MATERIALIZATION_END_TIME
        )
    ]
    ...
)

# The following example filters all source data from 2020/1/1
@batch_feature_view(
    ...
    sources=[
        transactions_source.select_range(
            start_time=datetime.datetime(2020, 1, 1),
            end_time=TectonTimeConstant.UNBOUNDED_FUTURE
        )
    ]
    ...
)

summary()

Displays a human-readable summary.

unfiltered()

Return an unfiltered DataSource. This scope will make an entire source available to a Feature View, but can increase compute costs as a result.

Returns

FilteredSource

validate()

Deprecation Warning

Deprecated in SDK 1.0. As of Tecton version 1.0 objects are validated upon object creation, so validate() is unnecessary.

Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.

Returns

None

Summary​

Attributes​

Methods​

__init__(...)​

Parameters

Example​

get_dataframe(...)​

Parameters

Returns

Raises

select_range(...)​

Parameters

Returns

Examples

summary()​

unfiltered()​

Returns

validate()​

Returns

Was this page helpful?

Summary

Attributes

Methods

init(...)

Example

get_dataframe(...)

select_range(...)

summary()

unfiltered()

validate()