Development Best Practices for Cost Optimization
This guide covers best practices for minimizing infrastructure costs during feature development, testing, and deployment.
Development vs. Live Workspaces​
Tecton provides separate workspace types to support cost-effective development:
Development Workspaces​
Development workspaces allow feature iteration without incurring materialization costs:
- No automatic materialization jobs
- Notebook-driven development
- Feature testing without infrastructure costs
- No real-time endpoints
# Create a development workspace
$ tecton workspace create my_new_feature_ws
$ tecton apply
Live Workspaces​
Use live workspaces only when you need:
- Training data with incremental backfills (requires materialization)
- REST API testing
- Materialization testing
- Stream source testing
- Production feature serving
Testing Features in Development​
import tecton
# Use development workspace for testing
ws = tecton.get_workspace("my_new_feature_ws")
fv = ws.get_feature_view("user_transaction_metrics")
# Test feature logic without materialization
df = fv.get_features_in_range(
start_time=datetime.now() - timedelta(days=7),
end_time=datetime.now(),
from_source=True, # Compute directly from source
)
Cost-Effective Feature Development​
Start with Recent Data​
Begin feature development with a limited time range:
@batch_feature_view(
...,
# Start with recent data
feature_start_time=datetime.now() - timedelta(days=7),
# Extend after validation
# feature_start_time=datetime(2022, 1, 1),
)
def my_feature_view():
return ...
Use Sampled Data Sources​
During development:
- Create smaller test datasets
- Use data sampling where appropriate
- Consider using
get_current_workspace()
for environment-specific sources
@pandas_batch_config
def sampled_data():
# Use sampled data in development
if get_current_workspace().name == "dev":
return small_sample_df
return full_data_df
Minimize Feature Values​
Reduce materialization costs by:
- Writing only necessary feature values
- Setting appropriate TTLs
- Using filtered data sources
- Limiting time windows where possible
- Reducing aggregation interval where possible
Configure Materialization Strategically​
- Start with
offline=False
until online serving is needed - Set appropriate batch schedules
- Use incremental materialization where possible
- Consider feature freshness requirements
We suggest first testing various configurations of feature freshness in offline training to understand the impact on model performance.
Integration Testing in Live Workspaces​
When testing in live workspaces:
- Set recent
feature_start_time
- Use sampled data sources
- Limit feature value cardinality
- Consider setting
offline=False
- Reuse existing feature views
- Reduce materialization frequency
Preventing Unnecessary Costs​
Protect Critical Objects​
Use prevent_destroy
to protect production features:
@batch_feature_view(..., prevent_destroy=True) # Prevent accidental deletion
def critical_feature():
return ...
Minimize Rematerialization​
- Build logical feature store model upfront
- Use CI/CD for controlled deployments
- Restrict workspace writes to authorized users
- Consider using
--suppress-materialization
Monitor Development Costs​
- Review materialization job status
- Track failed jobs
- Monitor resource utilization
- Set up cost alerts
Best Practices for Teams​
Development Workflow​
- Start in development workspace
- Use sampled data
- Validate feature logic
- Test in live workspace
- Deploy to production
Access Control​
- Limit live workspace access
- Use CI/CD for deployments
- Implement review processes
- Document cost implications
Best Practices Checklist​
- Use development workspaces for iteration
- Start with recent/sampled data
- Minimize materialized values
- Configure appropriate TTLs
- Protect production objects
- Monitor development costs
- Implement CI/CD controls
Next Steps​
- Review Online Store Optimization
- Learn about Compute Optimization
- Set up Cost Monitoring and Alerting