Skip to main content
Version: Beta 🚧

Scale Feature Server

Private Preview

This feature is currently in Private Preview.

This feature has the following limitations:
  • Access to this API is limited based on account type
If you would like to participate in the preview, please file a feature request.

Tecton provides an API to programmatically scale the Feature Server so you can right size Tecton resources to your workload. For instance, a customer anticipating twice the holiday traffic can provision double the Feature Server nodes, ensuring smooth feature serving without server errors. We provide both manual scaling (setting a specific number of nodes) and auto-scaling based on concurrent requests within a defined minimum and maximum number of nodes.

When to scale up the Feature Server​

Tecton recommends customers to consider scaling up the Feature Server during capacity planning, especially when expecting the traffic levels to surpass the current capacity provisioned by Tecton. An additional indication for scaling up is when encountering 429 errors while making 'get feature' requests. Tecton exposes the current usage through the overall feature serving dashboard. If the utilization percentage is close to 100%, Tecton will respond with a 429 error code to prevent over saturation.

When to scale down the Feature Server​

Tecton recommends customers to consider downsizing their Feature Server if, over the last 10 days, the peak utilization remains below 50% of the allocated capacity, and customer don't foresee increased traffic to Tecton in the near future. Customers can review the current utilization specifics through the overall feature serving dashboard.

Using the Scaling API​

The scaling API lets users retrieve the current Feature Server configuration and scaling the nodes up or down. In the following examples, please make sure to update the following based on your Tecton Account configuration:

  • <ACCOUNT_URL> to match the Tecton Account URL (e.g. mycompany.tecton.ai)
  • <API_KEY> to refer to an API key with admin permissions on the Tecton Account
  • <NUMBER> to refer to the desired count of Feature Server nodes

Retrieve Current Feature Server Configuration​

curl https://<ACCOUNT_URL>/api/v1/metadata-service/get-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" \
-X POST

Scale your Feature Server nodes up or down​

curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" \
-X POST -d '{ "count" : <NUMBER> }'

Sample Response for Both Queries​

This response indicates that your Tecton Account has created 5 total Feature Server nodes. Of the 5 nodes, 2 are available and ready for serving. It also shows the desired number of nodes that you can update via the set api.

{"currentCount":5,"availableCount":2,"desiredCount":10, "autoScalingConfig" : {"enabled": false}

Enable Auto Scaling​

minNodeCount: Minimum number of nodes that the Feature Server nodes can scale down to
maxNodeCount: Maximum number of nodes that the Feature Server nodes can scale up to

curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" \
-X POST -d '{ "autoScalingConfig" : {"enabled": true, "minNodeCount": 2, "maxNodeCount": 10} }'

Disable Auto Scaling​

  • We recommend checking the current node count before disabling auto scaling to ensure that the desired number of nodes is set to the current number of nodes.
curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" \
-X POST -d '{"count": 4, "autoScalingConfig" : {"enabled": false} }'

When to use Provisioned v. Auto Scaling Feature Server nodes​

Provisioned Scaling:​

  • Predictable Spikes When you have regular, well-defined peaks in traffic (e.g., scheduled batch jobs, seasonal events), and you can accurately estimate the required resources beforehand, provisioned scaling offers guaranteed capacity.
  • Steady Traffic: If your application experiences relatively stable traffic with minimal fluctuations, provisioned scaling provides consistent performance. and you know the number of nodes you need to serve the traffic.
  • Unpredictable Bursts: If you have unpredictable traffic bursts, provisioned scaling can help you avoid the overhead of scaling up and down frequently.

Auto Scaling:​

  • Standard Peaks and Troughs: When your traffic exhibits predictable cyclical patterns (e.g., daytime peaks, nighttime lows), auto scaling is cost-efficient since it scales down during off-peak hours
  • Gradual Traffic Changes: If your traffic patterns are unknown but fluctuate gradually, auto scaling adjusts to gradual increases or decreases in demand, maintaining performance without manual intervention.

Auto scaling has the following limitations​

  • Gradual Scaling: Increases are limited to +50% of the current deployment size or 10 nodes (whichever is higher) every 10 minutes. Decreases are limited to -10% of the current deployment size or 10 nodes (whichever is lower) every 10 minutes.
  • Resource-Based Scaling: Scaling decisions are based on Feature Server utilization
  • Deployment Size Cap: The maximum nodes per deployment is set to 50; please contact Tecton support to set a higher limit.
  • Instance Availability: Scaling is subject to the availability of instances in the underlying infrastructure. If there are no instances available, the scaling operation will not be able to scale out until more capacity becomes available. Tecton will continue to try to acquire nodes until the instance types are available.
  • Scaling Limitations: Auto Scaling is subject to Dynamo DB Auto Scaling Limitations. Feature views using DynamoDB are created in On-Demand mode by Tecton. DynamoDB On-Demand mode and can only double its capacity from the previous peak for a table every 30 minutes and will start to throttle requests beyond that limit. This will result in /get-features requests to return a 504 status. So while the number of Feature Server nodes can increase to accommodate a higher fraction of traffic, DynamoDB may not be able to handle the incremental traffic for a short period of time.
  • Only scales the Feature Server: Auto scaling does not scale your storage backends (Redis/DynamoDB). They need to be scaled manually based on your usage forecasts.
  • Does Not Scale Transformation Capacity Realtime Feature views that are suffering from timeouts due to resource constraints and timeouts executing the transformation will not trigger auto scaling unless it is using up the feature server concurrency proportionally.

Scheduling Feature Server Scale Ups for Scheduled Traffic​

Use cron jobs to scale up and down based on the expected traffic pattern. For example, if you expect a spike in traffic every day at 9 am, you can schedule a cron job to scale up the Feature Server at 8:30 am and scale down at 10 am.

  # crontab -e
30 8 * * * curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" -X POST -d '{ "count" : <HIGHER_NUMBER> }'
0 10 * * * curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
-H "Authorization: Tecton-key <API_KEY>" -X POST -d '{ "count" : <LOWER_NUMBER> }'

Errors​

  • The maximum number of feature server nodes allowed is X. Request count is Y
    • There is a limit to the maximum number of nodes you can provision. Please contact Tecton support if you want to raise this limit.
  • You cannot increase the number of nodes by more than X in a single request. Requested increase of nodes by Y
    • There is a limit to the number of nodes you can add using one query. We default this limit to 50 nodes. Please wait for the availableCount to reach the desiredCount before attempting to scale further.
  • serviceAccount <sa> not authorized to perform action scale_feature_server. See ../docs/setting-up-tecton/administration-setup/user-management-and-access-controls#summary-of-roles-and-permissions for details of what roles include the requested access.
    • This indicates that your service account doesn't have access to the scaling API. Go to Accounts and Access in your web ui and give your service account the admin role.

Was this page helpful?