Scale Feature Servers
This feature is currently in Private Preview.
- Access to this API is limited based on account type
Tecton provides an API to programmatically scale Feature Servers to handle irregular traffic patterns. For example, a customer expecting double the traffic during the holiday season can provision 2 times the Feature Servers compared to the normal, which would enable them to gracefully serve features during peak traffic without any server errors. Tecton provides both an option to manually set the number of nodes at any given time, and an option to auto scale based on concurrent request utilization between a minimum and maximum number of nodes.
When to scale up feature servers​
Tecton recommends customers to consider scaling up feature servers during capacity planning, especially when expecting the traffic levels to surpass the current capacity provisioned by Tecton. An additional indication for scaling up is when encountering 429 errors while making 'get feature' requests. Tecton exposes the current usage through the overall feature serving dashboard. If the utilization percentage is close to 100%, Tecton will respond with a 429 error code to prevent over saturation.
When to scale down feature servers​
Tecton recommends customers to consider downsizing their feature server if, over the last 10 days, the peak utilization remains below 50% of the allocated capacity, and customer don't foresee increased traffic to Tecton in the near future. Customers can review the current utilization specifics through the overall feature serving dashboard.
Using the Scaling API​
The scaling API lets users retrieve the current Feature Server configuration and scaling the pods up or down. In the following examples, please make sure to update the following based on your cluster configuration:
<CLUSTER_URL>
to match the cluster URL (e.g.mycluster.tecton.ai
)<API_KEY>
to refer to an API key with admin permissions on the cluster<NUMBER>
to refer to the desired count of Feature Server pods
Retrieve Current Feature Server Configuration​
curl https://<CLUSTER_URL>/api/v1/metadata-service/get-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" \
-X POST
Scale your Feature Server pods up or down​
curl https://<CLUSTER>/api/v1/metadata-service/set-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" \
-X POST -d '{ "count" : <NUMBER> }'
Sample Response for Both Queries​
This response indicates that your cluster has created 5 total Feature Server pods. Of the 5 pods, 2 are available and ready for serving. It also shows the desired number of pods that you can update via the set api.
{"currentCount":5,"availableCount":2,"desiredCount":10, "autoScalingConfig" : {"enabled": false}
Enable Auto Scaling​
minNodeCount
: Minimum number of nodes that the cluster can scale down to
maxNodeCount
: Maximum number of nodes that the cluster can scale up to
curl https://<CLUSTER>/api/v1/metadata-service/set-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" \
-X POST -d '{ "autoScalingConfig" : {"enabled": true, "minNodeCount": 2, "maxNodeCount": 10} }'
Disable Auto Scaling​
- We recommend checking the current pod count before disabling auto scaling to ensure that the desired number of pods is set to the current number of pods.
curl https://<CLUSTER>/api/v1/metadata-service/set-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" \
-X POST -d '{"count": 4, "autoScalingConfig" : {"enabled": false} }'
When to use Provisioned v. Auto Scaling Feature Servers​
Provisioned Scaling:​
- Predictable Spikes When you have regular, well-defined peaks in traffic (e.g., scheduled batch jobs, seasonal events), and you can accurately estimate the required resources beforehand, provisioned scaling offers guaranteed capacity.
- Steady Traffic: If your application experiences relatively stable traffic with minimal fluctuations, provisioned scaling provides consistent performance. and you know the number of nodes you need to serve the traffic.
- Unpredictable Bursts: If you have unpredictable traffic bursts, provisioned scaling can help you avoid the overhead of scaling up and down frequently.
Auto Scaling:​
- Standard Peaks and Troughs: When your traffic exhibits predictable cyclical patterns (e.g., daytime peaks, nighttime lows), auto scaling is cost-efficient since it scales down during off-peak hours
- Gradual Traffic Changes: If your traffic patterns are unknown but fluctuate gradually, auto scaling adjusts to gradual increases or decreases in demand, maintaining performance without manual intervention.