Development Best Practices for Cost Optimization
This guide covers best practices for minimizing infrastructure costs during feature development, testing, and deployment.
Development vs. Live Workspacesโ
Tecton provides separate workspace types to support cost-effective development:
Development Workspacesโ
Development workspaces allow feature iteration without incurring materialization costs:
- No automatic materialization jobs
- Notebook-driven development
- Feature testing without infrastructure costs
- No real-time endpoints
# Create a development workspace
$ tecton workspace create my_new_feature_ws
$ tecton apply
Live Workspacesโ
Use live workspaces only when you need:
- Training data with incremental backfills (requires materialization)
- REST API testing
- Materialization testing
- Stream source testing
- Production feature serving
Testing Features in Developmentโ
import tecton
# Use development workspace for testing
ws = tecton.get_workspace("my_new_feature_ws")
fv = ws.get_feature_view("user_transaction_metrics")
# Test feature logic without materialization
df = fv.get_features_in_range(
start_time=datetime.now() - timedelta(days=7),
end_time=datetime.now(),
from_source=True, # Compute directly from source
)
Cost-Effective Feature Developmentโ
Start with Recent Dataโ
Begin feature development with a limited time range:
@batch_feature_view(
...,
# Start with recent data
feature_start_time=datetime.now() - timedelta(days=7),
# Extend after validation
# feature_start_time=datetime(2022, 1, 1),
)
def my_feature_view():
return ...
Use Sampled Data Sourcesโ
During development:
- Create smaller test datasets
- Use data sampling where appropriate
- Consider using
get_current_workspace()for environment-specific sources
@pandas_batch_config
def sampled_data():
# Use sampled data in development
if get_current_workspace().name == "dev":
return small_sample_df
return full_data_df
Minimize Feature Valuesโ
Reduce materialization costs by:
- Writing only necessary feature values
- Setting appropriate TTLs
- Using filtered data sources
- Limiting time windows where possible
- Reducing aggregation interval where possible
Configure Materialization Strategicallyโ
- Start with
offline=Falseuntil online serving is needed - Set appropriate batch schedules
- Use incremental materialization where possible
- Consider feature freshness requirements
tip
We suggest first testing various configurations of feature freshness in offline training to understand the impact on model performance.
Integration Testing in Live Workspacesโ
When testing in live workspaces:
- Set recent
feature_start_time - Use sampled data sources
- Limit feature value cardinality
- Consider setting
offline=False - Reuse existing feature views
- Reduce materialization frequency
Preventing Unnecessary Costsโ
Protect Critical Objectsโ
Use prevent_destroy to protect production features:
@batch_feature_view(..., prevent_destroy=True) # Prevent accidental deletion
def critical_feature():
return ...
Minimize Rematerializationโ
- Build logical feature store model upfront
- Use CI/CD for controlled deployments
- Restrict workspace writes to authorized users
- Consider using
--suppress-materialization
Monitor Development Costsโ
- Review materialization job status
- Track failed jobs
- Monitor resource utilization
- Set up cost alerts
Best Practices for Teamsโ
Development Workflowโ
- Start in development workspace
- Use sampled data
- Validate feature logic
- Test in live workspace
- Deploy to production
Access Controlโ
- Limit live workspace access
- Use CI/CD for deployments
- Implement review processes
- Document cost implications
Best Practices Checklistโ
- Use development workspaces for iteration
- Start with recent/sampled data
- Minimize materialized values
- Configure appropriate TTLs
- Protect production objects
- Monitor development costs
- Implement CI/CD controls
Next Stepsโ
- Review Online Store Optimization
- Learn about Compute Optimization
- Set up Cost Monitoring and Alerting