Skip to main content
Version: 0.7

tecton.PushSource

Summary​

A Push Source is used to configure the Tecton Stream Ingest API for use with a Stream Feature View.

A Push Source may also contain an optional batch_config for efficiently backfilling historial feature values.

Example​

from tecton import HiveConfig, PushSource, BatchSource
from tecton.types import Field, Int64, String, Timestamp

# Declare a schema for the Push Source
input_schema = [
Field(name="user_id", dtype=String),
Field(name="event_timestamp", dtype=String),
Field(name="clicked", dtype=Int64),
]

# Declare a Push Source with a name, schema and a batch_config parameters
# See the API documentation for BatchConfig
click_event_source = PushSource(
name="click_event_source",
schema=input_schema,
batch_config=HiveConfig(
database="demo_ads",
table="impressions_batch",
),
description="Sample Push Source for click events",
)

Attributes​

NameData TypeDescription
created_atOptional[datetime.datetime]The time that this Tecton object was created or last updated.
data_delayOptional[datetime.timedelta]Returns the duration that materialization jobs wait after the batch_schedule before starting, typically to ensure that all data has landed.
defined_inOptional[str]The repo filename where this object was declared.
descriptionOptional[str]Returns the description of the Tecton object.
idstrReturns the unique id of the Tecton object.
info
namestrReturns the name of the Tecton object.
ownerOptional[str]Returns the owner of the Tecton object.
tagsDict[str,str]Returns the tags of the Tecton object.
workspaceOptional[str]Returns the workspace that this Tecton object belongs to.
optionsOptional[Dict[str, str]]A map of additional push source data source options.

Methods​

NameDescription
__init__(...)Creates a new Push Source.
get_columns()Returns the column names of the data source’s push schema.
get_dataframe(...)Returns the data in this Data Source as a Tecton DataFrame.
summary()Displays a human readable summary of this Data Source.
validate()Validate this Tecton object and its dependencies (if any).

__init__(...)​

Creates a new Push Source.

Parameters​

  • name (str) – A unique name of the DataSource.

  • description (Optional[str]) – A human-readable description. (Default: None)

  • tags (Optional[Dict[str, str]]) – Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). (Default: None)

  • owner (Optional[str]) – Owner name (typically the email of the primary maintainer). (Default: None)

  • prevent_destroy (bool) – If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be first set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertantly deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces. (Default: False)

  • schema (List[Field]) – A schema for the Push Source

  • batch_config (Union[FileConfig, HiveConfig, RedshiftConfig, SnowflakeConfig, SparkBatchConfig, None]) – An optional BatchConfig object containing the configuration of the Batch Data Source that backs this Tecton Push Source. The Batch Source’s schema must contain a super-set of all the columns defined in the Push Source schema. (Default: None)

get_columns()​

Returns the column names of the data source’s push schema.

get_dataframe(...)​

Returns the data in this Data Source as a Tecton DataFrame.

Parameters​

  • start_time (Optional[datetime]) – The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True. (Default: None)

  • end_time (Optional[datetime]) – The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True. (Default: None)

  • apply_translator (bool) – If True, the transformation specified by post_processor will be applied to the dataframe for the data source. apply_translator is not applicable to batch sources configured with spark_batch_config because it does not have a post_processor. (Default: None)

Returns​

A Tecton DataFrame containing the data source’s raw or translated source data.

Raises​

TectonValidationError – If apply_translator is False, but start_time or end_time filters are passed in.

summary()​

Displays a human readable summary of this Data Source.

validate()​

Validate this Tecton object and its dependencies (if any).

Validation performs most of the same checks and operations as tecton plan.

  1. Check for invalid object configurations, e.g. setting conflicting fields.

  2. For Data Sources and Feature Views, test query code and derive schemas. e.g. test that a Data Source’s specified s3 path exists or that a Feature View’s SQL code executes and produces supported feature data types.

Objects already applied to Tecton do not need to be re-validated on retrieval (e.g. my_workspace.get_feature_view('my_fv')) since they have already been validated during tecton plan.

Locally defined objects (e.g. my_ds = BatchSource(name="my_ds", ...)) may need to be validated before some of their methods can be called (e.g. my_feature_view.get_historical_features()).

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon