Skip to main content
Version: 1.1

Data Sources

Tecton can connect to practically any physical batch or stream source of data (e.g. S3, GCS, Snowflake, Redshift, Kafka, Kinesis etc.). To learn how to onboard your existing physical sources to Tecton, please head over to this guide.

This section explains how to use an onboarded physical source of data with a Feature View. In Tecton's framework, Data Sources are logical objects that define raw data sources that can be used by your Feature Views as inputs. A Data Source carries typical metadata (such as a name, an owner, or tags). In the case of batch or stream sources of data, they also reference your onboarded physical source of data.

Here's an example of a logical BatchSource, named fraud_users_batch, which references a physical raw Hive table fraud_users in the Hive database fraud:

from tecton import HiveConfig, BatchSource

fraud_users_batch = BatchSource(
name="users_batch",
batch_config=HiveConfig(database="fraud", table="fraud_users"),
)

Column Naming Requirements​

When defining schemas for Data Sources (BatchSource, StreamSource, RequestSource), column names must follow specific naming constraints to ensure compatibility with Tecton's validation system.

Naming Rules:

  • Column names must contain only letters (a-z, A-Z)
  • Numbers (0-9) are allowed
  • Single underscores (_) are allowed as separators
  • No other special characters, spaces, or consecutive underscores are permitted

Example of valid column names:

  • user_id
  • transaction_amount
  • timestamp_field
  • score123

Example of invalid column names:

  • user-id (hyphens not allowed)
  • transaction__amount (consecutive underscores not allowed)
  • timestamp field (spaces not allowed)

If you use invalid column names, you will encounter validation errors when defining your data source schema.

Tecton supports the following Data Source concepts:

  • BatchSource: References a physical batch source of raw data, such as a Hive table, a data warehouse table, or a file. Used as an input for a BatchFeatureView.
  • StreamSource: References a physical stream source (such as a Kafka topic or a Kinesis Stream) or a PushConfig that allows you to push events to Tecton via HTTP. It can also reference a physical batch source, which contains the stream's historical event log (used for backfills). Used as an input for a StreamFeatureView.
  • RequestSource: Defines the expected schema for request context data that is optionally sent to an RealtimeFeatureView at inference time.

Was this page helpful?