Sources of Data
Tecton enables you to build production-grade machine learning features by connecting to a variety of sources of data. Understanding the types of data sources available is key to designing robust and scalable feature pipelines. In Tecton, there are three primary types of data sources:
- Data Sources
- API Resources
- Feature Tables
This guide introduces each type, explains their use cases, and provides examples of how to use them in your feature definitions.
Data Sourcesโ
Data Sources are the foundational way to bring raw data into Tecton. They represent connections to external storage systems or data streams, such as:
- Data warehouses (e.g., Snowflake, Redshift, BigQuery)
- Data lakes (e.g., S3, Delta Lake, Hive)
- Streaming platforms (e.g., Kafka, Kinesis)
- Files (e.g., Parquet, CSV)
Data Sources are used as inputs to Feature Views (Batch or Stream) and are
defined using Tecton's configuration classes like
BatchSource and
StreamSource.
Example:
from tecton import BatchSource, FileConfig
transactions_batch = BatchSource(
name="transactions_batch",
batch_config=FileConfig(
uri="s3://my-bucket/transactions.parquet", file_format="parquet", timestamp_field="timestamp"
),
)
API Resourcesโ
API Resources allow Tecton to ingest data from operational sources such as arbitrary APIs or databases. This is especially useful for:
- Real-time event ingestion (e.g., user actions, sensor data)
- Synchronous feature computation at request time
API Resources are typically used with Push Sources or Request Sources in
Tecton, and are often paired with @realtime_feature_view or
@stream_feature_view.
Example:
from tecton import RequestSource, Field
from tecton.types import String
request_schema = [Field("user_id", String), Field("item_id", String)]
user_request = RequestSource(schema=request_schema)
Feature Tablesโ
Feature Tables are managed tables within Tecton that store precomputed features. They can be used as sources for new feature views, enabling feature reuse and modularity. Feature Tables are especially useful for:
- Sharing features across teams or projects
- Decoupling feature computation from feature consumption
- Serving features at low latency
Feature Tables can be referenced in new feature views, allowing you to build on top of existing features.
Example:
from tecton import FeatureTable
user_features = FeatureTable(name="user_features", ...)
Summaryโ
| Source Type | Typical Use Case | Example Classes |
|---|---|---|
| Data Source | Raw data ingestion (batch/stream) | BatchSource, StreamSource |
| API Resource | Real-time or request-time features | RequestSource, PushConfig |
| Feature Table | Reusing and serving precomputed features | FeatureTable |
What's Nextโ
- Define your features: Learn about the types of features you can create in Tecton.
- Read more about Feature Views: Learn about the three types of feature views: Batch, Stream, and Realtime.