Version: Beta 🚧

Scale Feature Server

Public Preview

This feature is currently in Public Preview.

This feature has the following limitations:

Access to this API is limited based on account type

If you have questions or want to share feedback, please file a support ticket.

Tecton provides an API to programmatically scale the Feature Server so you can right size Tecton resources to your workload. For instance, a customer anticipating twice the holiday traffic can provision double the Feature Server nodes, ensuring smooth feature serving without server errors. We provide both manual scaling (setting a specific number of nodes) and auto-scaling based on concurrent requests within a defined minimum and maximum number of nodes.

When to scale up the Feature Server

Tecton recommends customers to consider scaling up the Feature Server during capacity planning, especially when expecting the traffic levels to surpass the current capacity provisioned by Tecton. An additional indication for scaling up is when encountering 429 errors while making 'get feature' requests. Tecton exposes the current usage through the overall feature serving dashboard. If the utilization percentage is close to 100%, Tecton will respond with a 429 error code to prevent over saturation.

When to scale down the Feature Server

Tecton recommends customers to consider downsizing their Feature Server if, over the last 10 days, the peak utilization remains below 50% of the allocated capacity, and customer don't foresee increased traffic to Tecton in the near future. Customers can review the current utilization specifics through the overall feature serving dashboard.

Using the Scaling API

The scaling API lets users retrieve the current Feature Server configuration and scaling the nodes up or down. In the following examples, please make sure to update the following based on your Tecton Account configuration:

<ACCOUNT_URL> to match the Tecton Account URL (e.g. mycompany.tecton.ai)
<API_KEY> to refer to an API key with admin permissions on the Tecton Account
<NUMBER> to refer to the desired count of Feature Server nodes

Retrieve Current Feature Server Configuration

curl https://<ACCOUNT_URL>/api/v1/metadata-service/get-feature-server-config \
  -H "Authorization: Tecton-key <API_KEY>" \
  -X POST

Scale your Feature Server nodes up or down

curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
  -H "Authorization: Tecton-key <API_KEY>" \
  -X POST -d '{ "count" : <NUMBER> }'

Sample Response for Both Queries

This response indicates that your Tecton Account has created 5 total Feature Server nodes. Of the 5 nodes, 2 are available and ready for serving. It also shows the desired number of nodes that you can update via the set api.

{"currentCount":5,"availableCount":2,"desiredCount":10, "autoScalingConfig" : {"enabled": false}

Enable Auto Scaling

minNodeCount: Minimum number of nodes that the Feature Server nodes can scale down to
maxNodeCount: Maximum number of nodes that the Feature Server nodes can scale up to

curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
  -H "Authorization: Tecton-key <API_KEY>" \
  -X POST -d '{ "autoScalingConfig" : {"enabled": true, "minNodeCount": 2, "maxNodeCount": 10} }'

Disable Auto Scaling

We recommend checking the current node count before disabling auto scaling to ensure that the desired number of nodes is set to the current number of nodes.

curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
  -H "Authorization: Tecton-key <API_KEY>" \
  -X POST -d '{"count": 4, "autoScalingConfig" : {"enabled": false} }'

When to use Provisioned v. Auto Scaling Feature Server nodes

Provisioned Scaling:

Predictable Spikes When you have regular, well-defined peaks in traffic (e.g., scheduled batch jobs, seasonal events), and you can accurately estimate the required resources beforehand, provisioned scaling offers guaranteed capacity.
Steady Traffic: If your application experiences relatively stable traffic with minimal fluctuations, provisioned scaling provides consistent performance. and you know the number of nodes you need to serve the traffic.
Unpredictable Bursts: If you have unpredictable traffic bursts, provisioned scaling can help you avoid the overhead of scaling up and down frequently.

Auto Scaling:

Standard Peaks and Troughs: When your traffic exhibits predictable cyclical patterns (e.g., daytime peaks, nighttime lows), auto scaling is cost-efficient since it scales down during off-peak hours
Gradual Traffic Changes: If your traffic patterns are unknown but fluctuate gradually, auto scaling adjusts to gradual increases or decreases in demand, maintaining performance without manual intervention.

Auto scaling has the following limitations

Gradual Scaling: Increases are limited to +50% of the current deployment size or 10 nodes (whichever is higher) every 10 minutes. Decreases are limited to -10% of the current deployment size or 10 nodes (whichever is lower) every 10 minutes.
Resource-Based Scaling: Scaling decisions are based on Feature Server utilization
Deployment Size Cap: The maximum nodes per deployment is set to 50; please contact Tecton support to set a higher limit.
Instance Availability: Scaling is subject to the availability of instances in the underlying infrastructure. If there are no instances available, the scaling operation will not be able to scale out until more capacity becomes available. Tecton will continue to try to acquire nodes until the instance types are available.
Scaling Limitations: Auto Scaling is subject to Dynamo DB Auto Scaling Limitations. Feature views using DynamoDB are created in On-Demand mode by Tecton. DynamoDB On-Demand mode can only double its capacity from the previous peak for a table every 30 minutes and will start to throttle requests beyond that limit. This will result in /get-features requests to return a 504 status. So while the number of Feature Server nodes can increase to accommodate a higher fraction of traffic, DynamoDB may not be able to handle the incremental traffic for a short period of time.
Only scales the Feature Server: Auto scaling does not scale your storage backends (Redis/DynamoDB). They need to be scaled manually based on your usage forecasts.
Does Not Scale Transformation Capacity Realtime Feature views that are suffering from timeouts due to resource constraints and timeouts executing the transformation will not trigger auto scaling unless it is using up the feature server concurrency proportionally.

Scheduling Feature Server Scale Ups for Scheduled Traffic

Use cron jobs to scale up and down based on the expected traffic pattern. For example, if you expect a spike in traffic every day at 9 am, you can schedule a cron job to scale up the Feature Server at 8:30 am and scale down at 10 am.

  # crontab -e
  30 8 * * * curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
      -H "Authorization: Tecton-key <API_KEY>" -X POST -d '{ "count" : <HIGHER_NUMBER> }'
  0 10 * * * curl https://<ACCOUNT_URL>/api/v1/metadata-service/set-feature-server-config \
      -H "Authorization: Tecton-key <API_KEY>" -X POST -d '{ "count" : <LOWER_NUMBER> }'

Errors

The maximum number of feature server nodes allowed is X. Request count is Y
- There is a limit to the maximum number of nodes you can provision. Please contact Tecton support if you want to raise this limit.
You cannot increase the number of nodes by more than X in a single request. Requested increase of nodes by Y
- There is a limit to the number of nodes you can add using one query. We default this limit to 50 nodes. Please wait for the availableCount to reach the desiredCount before attempting to scale further.
serviceAccount <sa> not authorized to perform action scale_feature_server. See ../docs/setting-up-tecton/administration-setup/user-management-and-access-controls#summary-of-roles-and-permissions for details of what roles include the requested access.
- This indicates that your service account doesn't have access to the scaling API. Go to Accounts and Access in your web ui and give your service account the admin role.

When to scale up the Feature Server​

When to scale down the Feature Server​

Using the Scaling API​

Retrieve Current Feature Server Configuration​

Scale your Feature Server nodes up or down​

Sample Response for Both Queries​

Enable Auto Scaling​

Disable Auto Scaling​

When to use Provisioned v. Auto Scaling Feature Server nodes​

Provisioned Scaling:​

Auto Scaling:​

Auto scaling has the following limitations​

Scheduling Feature Server Scale Ups for Scheduled Traffic​

Errors​

Was this page helpful?