BatchSource
Summary​
A Tecton BatchSource, used to read batch data into Tecton for use in a BatchFeatureView.
Attributes​
Name | Data Type | Description |
---|---|---|
created_at | Optional[datetime.datetime] | Returns the time that this Tecton object was created or last updated. None for locally defined objects. |
data_delay | Optional[datetime.timedelta] | Returns the duration that materialization jobs wait after the batch_schedule before starting, typically to ensure that all data has landed. |
data_source_type | data_source_type_pb2.DataSourceType.ValueType | |
defined_in | Optional[str] | The repo filename where this object was declared. None for locally defined objects. |
description | Optional[str] | Returns the description of the Tecton object. |
id | str | Returns the unique id of the Tecton object. |
info | ||
name | str | Returns the name of the Tecton object. |
owner | Optional[str] | Returns the owner of the Tecton object. |
prevent_destroy | bool | Return whether entity has prevent_destroy flagged |
tags | Dict[str, str] | Returns the tags of the Tecton object. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. None for locally defined objects. |
Methods​
Name | Description |
---|---|
__init__(...) | Creates a new BatchSource. |
get_dataframe(...) | Returns the data in this Data Source as a Tecton DataFrame. |
select_range(...) | Returns this DataSource object wrapped as a FilteredSource. FilteredSources will automatically pre-filter sources in Feature View definitions and can reduce compute costs. |
summary() | Displays a human-readable summary. |
unfiltered() | Return an unfiltered DataSource. This scope will make an entire source available to a Feature View, but can increase compute costs as a result. |
validate() | Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary. |
__init__(...)​
Creates a new BatchSource.Parameters
name
(str
) - A unique name of the DataSource.description
(Optional
[str
]) - A human-readable description. Default:None
tags
(Optional
[Dict
[str
,str
]]) - Tags associated with this Tecton Data Source (key-value pairs of arbitrary metadata). Default:None
owner
(Optional
[str
]) - Owner name (typically the email of the primary maintainer). Default:None
prevent_destroy
(bool
) - If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object,prevent_destroy
must be set to False via the same tecton apply or a separate tecton apply.prevent_destroy
can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs.prevent_destroy
also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. ifprevent_destroy
is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service.prevent_destroy
is only enforced in live (i.e. non-dev) workspaces. Default:false
batch_config
(BatchConfigType
) - BatchConfig object containing the configuration of the Batch Data Source to be included in this Data Source.options
(Optional
[Dict
[str
,str
]]) - Additional options to configure the Source. Used for advanced use cases and beta features. Default:None
Example​
# Declare a BatchSource with a HiveConfig instance as its batch_config parameter.
# Refer to the "Configs Classes and Helpers" section for other batch_config types.
from tecton import HiveConfig, BatchSource
credit_scores_batch = BatchSource(
name="credit_scores_batch",
batch_config=HiveConfig(database="demo_fraud", table="credit_scores", timestamp_field="timestamp"),
)
get_dataframe(...)​
Returns the data in this Data Source as a Tecton DataFrame.Parameters
start_time
(Optional
[datetime.datetime
]) - The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translator
is True. Default:None
end_time
(Optional
[datetime.datetime
]) - The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translator
is True. Default:None
apply_translator
(bool
) - If True, the transformation specified bypost_processor
will be applied to the dataframe for the data source.apply_translator
is not applicable to batch sources configured withspark_batch_config
because it does not have apost_processor
. Default:true
compute_mode
(Optional
[Union
[ComputeMode
,str
]]) - Compute mode to use to produce the data frame. Default:None
Returns
data_frame.TectonDataFrame
: A Tecton DataFrame containing the data source's raw or translated source data.Raises
TectonValidationError
: Ifapply_translator
is False, butstart_time
orend_time
filters are passed in.
summary()​
Displays a human-readable summary.(Deprecated) validate()​
Method is deprecated and will be removed in a future version. As of Tecton version 1.0, objects are validated upon object creation, so validation is unnecessary.Returns
None