Version: 1.1

Data Sources

Tecton can connect to practically any physical batch or stream source of data (e.g. S3, GCS, Snowflake, Redshift, Kafka, Kinesis etc.). To learn how to onboard your existing physical sources to Tecton, please head over to this guide.

This section explains how to use an onboarded physical source of data with a Feature View. In Tecton's framework, Data Sources are logical objects that define raw data sources that can be used by your Feature Views as inputs. A Data Source carries typical metadata (such as a name, an owner, or tags). In the case of batch or stream sources of data, they also reference your onboarded physical source of data.

Here's an example of a logical BatchSource, named fraud_users_batch, which references a physical raw Hive table fraud_users in the Hive database fraud:

from tecton import HiveConfig, BatchSource

fraud_users_batch = BatchSource(
    name="users_batch",
    batch_config=HiveConfig(database="fraud", table="fraud_users"),
)

Tecton supports the following Data Source concepts:

BatchSource: References a physical batch source of raw data, such as a Hive table, a data warehouse table, or a file. Used as an input for a BatchFeatureView.
StreamSource: References a physical stream source (such as a Kafka topic or a Kinesis Stream) or a PushConfig that allows you to push events to Tecton via HTTP. It can also reference a physical batch source, which contains the stream's historical event log (used for backfills). Used as an input for a StreamFeatureView.
RequestSource: Defines the expected schema for request context data that is optionally sent to an RealtimeFeatureView at inference time.

Was this page helpful?