Skip to main content
Version: 0.8

Metrics API

Metrics for monitoring the performance of the Tecton Feature Platform are available through the Tecton Metrics API. The Tecton Metrics API follows the OpenMetrics standard for metrics collection, which is supported by common Application Performance Monitoring systems such as DataDog, SignalFX and New Relic.

Available Metrics​

The following metrics are currently available through the Metrics API. Leverage these metrics to build monitoring dashboards and alerts with your Application Performance Monitoring system.

NameTypeUnitDescriptionLabelsRelease stage
feature_service_requests_total_rateGAUGErequests per secondTotal count of feature service requests over five minutuesaws_regionGA
feature_service_requests_rateGAUGErequests per secondCount of GetFeatures and GetFeaturesBatch by feature service over five minutues. GetFeaturesBatch calls are translated (batch size directly corresponds to a proportional increase in the rate) into actual GetFeatures calls to calculate this valueaws_region, feature_service_id, feature_service_nameGA
feature_service_total_latencySUMMARYsecondFeature serving GetFeatures and GetFeaturesBatch latencyGA
feature_service_latencySUMMARYsecondFeature serving GetFeatures and GetFeaturesBatch latency by servicefeature_service_id, feature_service_nameGA
feature_server_errors_rateGAUGEpercentFeature serving errors rate by HTTP statusstatusGA
feature_server_utilizationGAUGEpercentMaximum utilization percentage among all the feature server instancesaws_regionGA
feature_service_requests_total_rate_per_server_groupGAUGErequests per secondTotal count of feature service requests per server group over five minutuesaws_region, server_group_namePreview
feature_service_requests_rate_per_server_groupGAUGErequests per secondCount of GetFeatures and GetFeaturesBatch by feature service per server group over five minutues. GetFeaturesBatch calls are translated (batch size directly corresponds to a proportional increase in the rate) into actual GetFeatures calls to calculate this valueaws_region, feature_service_id, feature_service_name, server_group_namePreview
feature_server_errors_rate_grpcGAUGEpercentFeature serving errors rate by GRPC codestatusPreview
feature_server_errors_rate_grpc_per_server_groupGAUGEpercentFeature serving errors rate by GRPC code per server groupstatus, server_group_namePreview
feature_server_average_utilizationGAUGEpercentAverage utilization percentage among all the feature server instancesaws_regionPreview
feature_server_minimum_utilizationGAUGEpercentMinimum utilization percentage among all the feature server instancesaws_regionPreview
feature_server_average_utilization_per_server_groupGAUGEpercentAverage utilization percentage among all the feature server instances per server groupaws_region, server_group_namePreview
feature_server_max_utilization_per_server_groupGAUGEpercentMaximum utilization percentage among all the feature server instances per server groupaws_region, server_group_namePreview
feature_server_minimum_utilization_per_server_groupGAUGEpercentMinimum utilization percentage among all the feature server instances per server groupaws_region, server_group_namePreview
spark_stream_max_processed_event_ageGAUGEsecondThe maximum age of event processed for Spark streamingworkspace, feature_view_name, feature_view_idPreview
spark_stream_average_processed_event_ageGAUGEsecondThe average age of events processed for Spark streamingworkspace, feature_view_name, feature_view_idPreview
spark_stream_input_rateGAUGErequests per secondThe stream request input rate for Spark streamingworkspace, feature_view_name, feature_view_idPreview
spark_stream_processing_rateGAUGErequests per secondThe stream request processing rate for Spark streamingworkspace, feature_view_name, feature_view_idPreview
spark_stream_served_feature_ageGAUGEsecondThe served feature age for Spark streamingworkspace, feature_view_name, feature_view_idPreview
spark_stream_online_store_write_rateGAUGErows per secondThe online store write request rate for Spark streamingworkspace, feature_view_name, feature_view_idPreview
stream_ingestapi_request_rateGAUGErequests per secondThe request rate for Stream Ingest APIaws_regionPreview
stream_ingestapi_request_processing_latencySUMMARYsecondThe request processing latency for Stream Ingest APIaws_regionPreview
stream_ingestapi_request_processing_error_rateGAUGErequests per secondThe request handling error rate for Stream Ingest APIaws_region, error_code (4xx or 5xx)Preview
stream_ingestapi_online_store_write_rateGAUGErows per secondThe rows write rate to the online store for Stream Ingest APIworkspace, feature_view_name, feature_view_idPreview
stream_ingestapi_offline_store_write_rateGAUGErows per secondThe rows write rate to the offline store for Stream Ingest APIworkspace, feature_view_name, feature_view_idPreview
feature_server_scaling_requestsGAUGErequests count per response codeThe count of feature server scaling request responses per GRPC response codeaws_region, code (OK or PERMISSION_DENIED)Preview
feature_server_autoscaler_desired_replica_countGAUGEthe desired feature server replica countThe desired replica count of feature server set by autoscaling policy (empty if autoscaling is disabled)aws_regionPreview
feature_server_autoscaler_current_replica_countGAUGEthe current feature server replica countThe current replica count of feature server (empty if autoscaling is disabled)aws_regionPreview
feature_server_autoscaler_max_replica_countGAUGEthe maximum feature server replica countThe maximum replica count of feature server (empty if autoscaling is disabled)aws_regionPreview

More details on the metric types and output formats can be found in the OpenMetrics reference.

Metric release stages​

The release stage represents the expected stability of the metric. Tecton recommends only relying on metrics marked as GA for production dashboards. The release stage is noted in the table above, as well as in the Help string of the OpenMetrics protocol.

Release stageDescription
Generally Available (GA)Ready for production use. Schema and definition of the metric will not change.
PreviewInitial release intended for collecting feedback. Schema and definition of the metric may change before moving to GA.
DeprecatedUsage of the metric is discouraged. Will be maintained until the specified end-of-support date.

Metrics API endpoint​

The Metrics API endpoint is https://<your-instance>.tecton.ai/api/v1/observability/metrics.

To query metrics using ‘curl’, run:

curl -H "Authorization: Tecton-key $TECTON_API_KEY" https://$INSTANCE.tecton.ai/api/v1/observability/metrics

Example output

feature_service_requests_rate{aws_region="us-west-2",feature_service_id="072b546997cb6e586ed460ff0a3743ee",feature_service_name="fvfs_1"} 0.003703703703703704 1692222226141
feature_service_requests_rate{aws_region="us-west-2",feature_service_id="005c5a6f3517e1e2a4ce411372a15d84",feature_service_name="fvfs_2"} 0 1692222226141
feature_service_requests_rate{aws_region="us-west-2",feature_service_id="00c922b96f55a948e1bbfa08fdb3a699",feature_service_name="fvfs_3"} 0 1692222226141

This is a sample output for a gauge metric named feature_service_requests_rate with labels aws_region, feature_service_id and feature_service_name. The seconds part of the output is the value of the metric, and the last part is the timestamp in milliseconds. For output format details, see the OpenMetrics reference.

Example Integrations​

The following sections show how to configure common Application Performance Monitoring systems to scrape the Tecton Metrics API.

By default, Tecton recommends scraping the Metrics API on 30 second intervals.

DataDog​

A DataDog agent can be easily configured to ingest metrics from Tecton’s Metrics API.

If you’re already using DataDog, there’s a high chance that you have a DataDog agent up and running. In this case, integration will be even easier.

Overall, integration can be done in 3 steps:

  1. Create a Tecton Service account via Tecton CLI

    $ tecton service-account create \
    --name "metrics-consumer" \
    --description "Consumer of operational metrics"

    Save this API Key - you will not be able to get it again.
    API Key: your-api-key
    Service Account ID: your-service-account-id

    Save the API key returned by the command. You will need it when configuring the agent.

  2. Install DataDog agent

    note

    This step can be skipped if you already have a DataDog agent (≥ 7.32.0) running on one of your machines and this machine has access to Tecton API.

    The installation procedure depends on the platform. Use the official DataDog documentation for the specific platform.

  3. Configure the agent to ingest Tecton metrics

We need to edit one of the configuration files that come with Datadog Agent. Datadog Agent configuration files can be found in the agent configuration directory. Modify openmetrics.d/conf.yaml by adding the following:

instances:
- openmetrics_endpoint: https://<your-tecton-url>.tecton.ai/api/v1/observability/metrics
namespace: tecton # all exported metrics will have this namespace
metrics:
- .+ # store all metrics
min_collection_interval: 30
headers:
Authorization: Tecton-key <REPLACE THIS WITH TOKEN>

SignalFX​

  1. Create a Tecton Service account via Tecton CLI

    $ tecton service-account create \
    --name "metrics-consumer" \
    --description "Consumer of operational metrics"

    Save this API Key - you will not be able to get it again.
    API Key: your-api-key
    Service Account ID: your-service-account-id

    Save the API key returned by the command. You will need it when configuring the agent.

  2. Deploy the Splunk OpenTelemetry connector https://docs.splunk.com/Observability/gdi/opentelemetry/opentelemetry.html

  3. Configure the collector to ingest Tecton metrics

Example configuration:

receivers:
lightprometheus:
endpoint: https://<your-tecton-url>.tecton.ai/api/v1/observability/metrics
headers:
Authorization: Tecton-key <TOKEN>
collection_interval: 30s
resource_attributes:
service.name:
enabled: false
service.instance.id:
enabled: false

exporters:
signalfx:
access_token: <SIGNALFX TOKEN>
realm: <SIGNALFX REALM>

service:
pipelines:
metrics:
receivers: [lightprometheus]
exporters: [signalfx]

OpenTelemetry (OTEL) Collector​

If your Application Performance Monitoring system doesn’t support OpenMetrics out of the box, you can install the OTEL collector and configure it to export to a cloud-based monitoring system or any self-hosted alternative. There are enough out-the-box exporters in the OTEL collector to support almost any possible monitoring setup.

The following example shows how to configure the OTEL collector for use with the Tecton Metrics API (the config can vary depending on the version of the collector See config docs:

receivers:
prometheus:
config:
scrape_configs:
- job_name: otel-collector
scrape_interval: 30s
scheme: https
metrics_path: /api/v1/observability/metrics
authorization:
type: Tecton-key
credentials: <API key>
static_configs:
- targets: [<cluster>.tecton.ai]

exporters:
datadog:
api:
site: <DD_SITE>
key: <DD_API_KEY>

processors:
batch:
send_batch_max_size: 100
send_batch_size: 10
timeout: 10s

service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [datadog]

Was this page helpful?