Connect Data Sources to Rift
Rift is Tecton's Python-native compute engine that can connect to virtually any data source accessible through Python. This gives you tremendous flexibility in where you source your feature data from.
Connect Batch Data Sources to Rift​
Since Rift runs Python code, you can connect to any data source that has a Python client or SDK. This includes:
- SQL databases (MySQL, PostgreSQL, Oracle, etc.)
- Data warehouses (Snowflake, BigQuery, Redshift, etc.)
- Object storage (S3, GCS, Azure Blob Storage, etc.)
- File formats (CSV, Parquet, JSON, etc.)
- REST APIs
- Custom data sources via Python packages
Here are some example implementations showing how to connect to common data sources:
To connect to other data sources, you can:
- Import any required Python packages in your Feature View transformation
- Use standard Python code to read your data
- Return a pandas (or pyArrow) DataFrame containing your feature data
For example:
@pandas_batch_config
def my_custom_source():
# Import any Python package you need
import requests
import pandas as pd
# Connect to your data source using Python
response = requests.get("https://api.mydatasource.com/data")
data = response.json()
# Return a pandas DataFrame
return pd.DataFrame(data)
my_batch_source = BatchSource(name="my_custom_source", batch_config=my_custom_source)
Connect Stream Data Sources to Rift​
For streaming data sources, Tecton provides the Stream Ingest API -- a high-performance HTTP endpoint that can ingest events from any streaming source. Any system capable of making HTTP requests can push data into Tecton through this API.
Here are examples showing how to connect common streaming platforms:
You can adapt these examples to work with any streaming platform by:
- Reading events from your stream
- Formatting them according to the Stream Ingest API specification
- Making HTTP POST requests to push the events to Tecton
# Example: Connect any stream source to Tecton
def push_events_to_tecton(events):
import requests
TECTON_API_URL = "https://my-cluster.tecton.ai/api/v1/ingest"
# Format events for Tecton's Stream Ingest API
payload = {"workspace_name": "prod", "records": {"my_stream_source": [{"record": event} for event in events]}}
# Push to Tecton
response = requests.post(TECTON_API_URL, headers={"Authorization": f"Tecton-key {TECTON_API_KEY}"}, json=payload)
return response
See the Stream Ingest API for detailed specifications and best practices.
The examples above showcase just a few of the many possible ways to connect data sources to Tecton. Since Rift is built on Python, you have the flexibility to connect to virtually any data source using Python's extensive ecosystem of libraries and tools.