Latency Budgets
This feature is currently in Public Preview.
Introduction​
Many production AI applications require strict latency guarantees for online feature serving. For time-sensitive applications, it is often better for business outcomes to receive a partial set of features within a latency budget from Tecton instead of waiting for all features to complete computation. The Tecton HTTP API provides an option for specifying a latency budget.
To set a latency budget, Tecton users can simply specify the budget in the request options parameter. The API then enforces the specified time limit, ensuring that the response is returned promptly with whatever data could be processed within the given duration.
Usage​
You can optionally pass a value into the requestOptions parameter called
latencyBudgetMs that indicates the number of milliseconds you want to wait
before returning a result. This will override the default timeout behavior where
an error is returned if a request exceeds a duration of 2s.
If the specified latency budget is reached during a request, the feature server will attempt to return the available results as quickly as possible.
- Results that could not be calculated within the time limit will return a null value with a status of TIME_OUT.
- The feature server will then construct a response containing the successfully completed results and return a subset of the requested values.
- The total server processing time may exceed the specified latency budget by a few milliseconds due to the additional time required to build the response.
Values should generally be greater than 100 ms. Setting values below this threshold may produce inconsistent results and fail to reliably match the specified latency budget.
Sample Request​
$ curl -X POST https://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"params": {
"workspaceName": "prod",
"featureServiceName": "fraud_detection_feature_service",
"joinKeyMap": {
"user_id": "C1000262126"
},
"requestOptions": {
"latencyBudgetMs": "200"
},
"metadataOptions": {
"includeNames": true,
"includeEffectiveTimes": true,
"includeDataTypes": true,
"includeSloInfo": true,
"includeServingStatus": true
}
}
}'
Sample Response​
{
"result": {
"features": ["0", "1", 216409, null]
},
"metadata": {
"features": [
{
"name": "transaction_amount_is_high.transaction_amount_is_high",
"dataType": {
"type": "int64"
},
"status": "PRESENT"
},
{
"name": "transaction_amount_is_higher_than_average.transaction_amount_is_higher_than_average",
"dataType": {
"type": "int64"
},
"status": "PRESENT"
},
{
"name": "last_transaction_amount_sql.amount",
"effectiveTime": "2021-08-21T01:23:58.996Z",
"dataType": {
"type": "float64"
},
"status": "PRESENT"
},
{
"name": "transaction_amount_last_1000d.sum",
"dataType": {
"type": "int64"
},
"status": "TIME_OUT"
}
],
"sloInfo": {
"sloEligible": true,
"sloServerTimeSeconds": 0.201323,
"dynamodbResponseSizeBytes": 204,
"serverTimeSeconds": 0.049082851
}
}
}
Limitations​
- The specified latency budget should generally be between 100ms and 2000ms.
- Realtime Feature Views cannot be partially returned at this time.