Skip to main content
Version: 1.2

Sources of Data

Tecton enables you to build production-grade machine learning features by connecting to a variety of sources of data. Understanding the types of data sources available is key to designing robust and scalable feature pipelines. In Tecton, there are three primary types of data sources:

  • Data Sources
  • API Resources
  • Feature Tables

This guide introduces each type, explains their use cases, and provides examples of how to use them in your feature definitions.

Data Sources​

Data Sources are the foundational way to bring raw data into Tecton. They represent connections to external storage systems or data streams, such as:

  • Data warehouses (e.g., Snowflake, Redshift, BigQuery)
  • Data lakes (e.g., S3, Delta Lake, Hive)
  • Streaming platforms (e.g., Kafka, Kinesis)
  • Files (e.g., Parquet, CSV)

Data Sources are used as inputs to Feature Views (Batch or Stream) and are defined using Tecton's configuration classes like BatchSource and StreamSource.

Example:

from tecton import BatchSource, FileConfig

transactions_batch = BatchSource(
name="transactions_batch",
batch_config=FileConfig(
uri="s3://my-bucket/transactions.parquet", file_format="parquet", timestamp_field="timestamp"
),
)

API Resources​

API Resources allow Tecton to ingest data from operational sources such as arbitrary APIs or databases. This is especially useful for:

  • Real-time event ingestion (e.g., user actions, sensor data)
  • Synchronous feature computation at request time

API Resources are typically used with Push Sources or Request Sources in Tecton, and are often paired with @realtime_feature_view or @stream_feature_view.

Example:

from tecton import RequestSource, Field
from tecton.types import String

request_schema = [Field("user_id", String), Field("item_id", String)]

user_request = RequestSource(schema=request_schema)

Feature Tables​

Feature Tables are managed tables within Tecton that store precomputed features. They can be used as sources for new feature views, enabling feature reuse and modularity. Feature Tables are especially useful for:

  • Sharing features across teams or projects
  • Decoupling feature computation from feature consumption
  • Serving features at low latency

Feature Tables can be referenced in new feature views, allowing you to build on top of existing features.

Example:

from tecton import FeatureTable

user_features = FeatureTable(name="user_features", ...)

Summary​

Source TypeTypical Use CaseExample Classes
Data SourceRaw data ingestion (batch/stream)BatchSource, StreamSource
API ResourceReal-time or request-time featuresRequestSource, PushConfig
Feature TableReusing and serving precomputed featuresFeatureTable

What's Next​

Was this page helpful?