Version: 0.7

Scale Feature Servers

Public Preview

This feature is currently in Public Preview.

This feature has the following limitations:

Access to this API is limited based on account type

If you have questions or want to share feedback, please file a support ticket.

Tecton provides an API to programmatically scale Feature Servers to handle irregular traffic patterns. For example, a customer expecting double the traffic during the holiday season can provision 2 times the Feature Servers compared to the normal, which would enable them to gracefully serve features during peak traffic without any server errors. Tecton provides both an option to manually set the number of nodes at any given time, and an option to auto scale based on concurrent request utilization between a minimum and maximum number of nodes.

When to scale up feature servers

Tecton recommends customers to consider scaling up feature servers during capacity planning, especially when expecting the traffic levels to surpass the current capacity provisioned by Tecton. An additional indication for scaling up is when encountering 429 errors while making 'get feature' requests. Tecton exposes the current usage through the overall feature serving dashboard. If the utilization percentage is close to 100%, Tecton will respond with a 429 error code to prevent over saturation.

When to scale down feature servers

Tecton recommends customers to consider downsizing their feature server if, over the last 10 days, the peak utilization remains below 50% of the allocated capacity, and customer don't foresee increased traffic to Tecton in the near future. Customers can review the current utilization specifics through the overall feature serving dashboard.

Using the Scaling API

The scaling API lets users retrieve the current Feature Server configuration and scaling the pods up or down. In the following examples, please make sure to update the following based on your cluster configuration:

<CLUSTER_URL> to match the cluster URL (e.g. mycluster.tecton.ai)
<API_KEY> to refer to an API key with admin permissions on the cluster
<NUMBER> to refer to the desired count of Feature Server pods

Retrieve Current Feature Server Configuration

curl https://<CLUSTER_URL>/api/v1/metadata-service/get-feature-server-config \
  -H "Authorization: Tecton-key <API_KEY>" \
  -X POST

Scale your Feature Server pods up or down

curl https://<CLUSTER>/api/v1/metadata-service/set-feature-server-config \
  -H "Authorization: Tecton-key <API_KEY>" \
  -X POST -d '{ "count" : <NUMBER> }'

Sample Response for Both Queries

This response indicates that your cluster has created 5 total Feature Server pods. Of the 5 pods, 2 are available and ready for serving. It also shows the desired number of pods that you can update via the set api.

{"currentCount":5,"availableCount":2,"desiredCount":10, "autoScalingConfig" : {"enabled": false}

Enable Auto Scaling

minNodeCount: Minimum number of nodes that the cluster can scale down to
maxNodeCount: Maximum number of nodes that the cluster can scale up to

curl https://<CLUSTER>/api/v1/metadata-service/set-feature-server-config \
  -H "Authorization: Tecton-key <API_KEY>" \
  -X POST -d '{ "autoScalingConfig" : {"enabled": true, "minNodeCount": 2, "maxNodeCount": 10} }'

Disable Auto Scaling

We recommend checking the current pod count before disabling auto scaling to ensure that the desired number of pods is set to the current number of pods.

curl https://<CLUSTER>/api/v1/metadata-service/set-feature-server-config \
  -H "Authorization: Tecton-key <API_KEY>" \
  -X POST -d '{"count": 4, "autoScalingConfig" : {"enabled": false} }'

When to use Provisioned v. Auto Scaling Feature Servers

Provisioned Scaling:

Predictable Spikes When you have regular, well-defined peaks in traffic (e.g., scheduled batch jobs, seasonal events), and you can accurately estimate the required resources beforehand, provisioned scaling offers guaranteed capacity.
Steady Traffic: If your application experiences relatively stable traffic with minimal fluctuations, provisioned scaling provides consistent performance. and you know the number of nodes you need to serve the traffic.
Unpredictable Bursts: If you have unpredictable traffic bursts, provisioned scaling can help you avoid the overhead of scaling up and down frequently.

Auto Scaling:

Standard Peaks and Troughs: When your traffic exhibits predictable cyclical patterns (e.g., daytime peaks, nighttime lows), auto scaling is cost-efficient since it scales down during off-peak hours
Gradual Traffic Changes: If your traffic patterns are unknown but fluctuate gradually, auto scaling adjusts to gradual increases or decreases in demand, maintaining performance without manual intervention.

Auto scaling has the following limitations

Gradual Scaling: Increases are limited to +50% of the current deployment size or 10 nodes (whichever is higher) every 10 minutes. Decreases are limited to -10% of the current deployment size or 10 nodes (whichever is lower) every 10 minutes.
Resource-Based Scaling: Scaling decisions are based on Feature Server utilization
Deployment Size Cap: The maximum nodes per deployment is set to 50; please contact Tecton support to set a higher limit.
Instance Availability: Auto scaling is subject to the availability of instances in the underlying infrastructure. If there are no instances available, the scaling operation will not be able to scale out until more capacity becomes available. Tecton will continue to try to acquire nodes until the instance types are available.
Scaling Limitations: Auto Scaling is subject to Dynamo DB Auto Scaling Limitations. Feature views using DynamoDB are created in On-Demand mode by Tecton. DynamoDB On-Demand mode and can only double its capacity from the previous peak for a table every 30 minutes and will start to throttle requests beyond that limit. This will result in /get-features requests to return a 504 status. So while the number of feature servers can increase to accommodate a higher fraction of traffic, DynamoDB may not be able to handle the incremental traffic for a short period of time.
Only scales Feature Servers: Auto scaling does not cover scale your storage backends (Redis/DynamoDB). They need to be scaled manually based on your usage forecasts.
Does Not Scale Transformation Capacity On Demand Feature views that are suffering from timeouts due to resource constraints and timeouts executing the transformation will not trigger auto scaling unless it is using up the feature server concurrency proportionally.

Scheduling Feature Server Scale Ups for Scheduled Traffic

Use cron jobs to scale up and down based on the expected traffic pattern. For example, if you expect a spike in traffic every day at 9 am, you can schedule a cron job to scale up the feature servers at 8:30 am and scale down at 10 am.

  # crontab -e
  30 8 * * * curl https://<CLUSTER>/api/v1/metadata-service/set-feature-server-config \
      -H "Authorization: Tecton-key <API_KEY>" -X POST -d '{ "count" : <HIGHER_NUMBER> }'
  0 10 * * * curl https://<CLUSTER>/api/v1/metadata-service/set-feature-server-config \
      -H "Authorization: Tecton-key <API_KEY>" -X POST -d '{ "count" : <LOWER_NUMBER> }'

Errors

The maximum number of feature server pods allowed is X. Request count is Y
- There is a limit to the maximum number of pods you can provision. Please contact Tecton support if you want to raise this limit.
You cannot increase the number of pods by more than X in a single request. Requested increase of pods by Y
- There is a limit to the number of pods you can add using one query. We default this limit to 50 pods. Please wait for the availableCount to reach the desiredCount before attempting to scale further.
serviceAccount <sa> not authorized to perform action scale_feature_server. See ../docs/setting-up-tecton/administration-setup/user-management-and-access-controls#summary-of-roles-and-permissions for details of what roles include the requested access.
- This indicates that your service account doesn't have access to the scaling API. Go to Accounts and Access in your web ui and give your service account the admin role.

When to scale up feature servers​

When to scale down feature servers​

Using the Scaling API​

Retrieve Current Feature Server Configuration​

Scale your Feature Server pods up or down​

Sample Response for Both Queries​

Enable Auto Scaling​

Disable Auto Scaling​

When to use Provisioned v. Auto Scaling Feature Servers​

Provisioned Scaling:​

Auto Scaling:​

Auto scaling has the following limitations​

Scheduling Feature Server Scale Ups for Scheduled Traffic​

Errors​

Was this page helpful?