Skip to content

Fetching Online Features

Overview

A model prediction service needs feature vectors delivered as input with low latency. Tecton makes pre-computed, or materialized, features available for low-latency retrieval. You can fetch online features from a Feature Package or a Feature Service.

You might also need to calculate features from raw data at request-time. For online features that can only be calculated at request time, use an Online Feature Package.

You can fetch a combination of feature vectors and raw data. An example of such a combined call is shown below in Combining a Feature Vector and Raw Data.

Finally, you can use the Python SDK or the HTTP API to perform any of these calls. Examples for both are given below. Tecton recommends using the HTTP API to fetch feature values in production, as the Python SDK can add latency to online feature requests.

Fetching Online Features with the Python SDK

  1. Fetch a Feature Service (FeatureService object) into your Python environment and use its get_feature_vector method to fetch a feature vector.
  2. To request a feature vector from the Tecton Feature Service, supply a list of join keys. The join keys identify the rows for which the service retrieves features. In the example below the join keys are ad_id and user_uuid.
from tecton import FeatureService

my_fs = tecton.get_feature_service('ctr_prediction_service')

keys = {
        "ad_id": "1000",
        "user_uuid": "b69f8dc3-6611-4d4a-bbce-032fd1d6eca9"
}

response = my_fs.get_feature_vector(keys)
print(response.to_dict())

get_feature_vector returns an object that can be cast to a Pandas DataFrame, NumPy array, or other programmatic data object. In the example above, it is cast to a Python dictionary. For more information about how to use get_feature_vector, see Get Feature Vector.

Getting Features from an Online Feature Package

The Tecton SDK and HTTP API both call pre-computed data for prediction. Depending on the nature of your model, you might also need to transform raw data from the payload at the time of prediction.

An Online Feature Package computes feature values at request-time. Instead of passing a list of keys that Tecton uses to retrieve feature values, you pass the data that Tecton uses to compute feature values.

import tecton

my_online_fp = tecton.get_feature_package('ad_is_displayed_as_banner')

request_data = {
        'ad_display_placement': 'Banner',
}

response = my_online_fp.get_feature_vector(request_data)
print(response.to_dict())

Combining a Feature Vector and Raw Data

To fetch online features from a Feature Service containing both Online Feature Packages and materialized Feature Packages, provide both join_keys and request_data:

import tecton

my_fs = tecton.get_feature_service('ctr_online_prediction_service')

keys = {
        "ad_id": "1000",
        "user_uuid": "b69f8dc3-6611-4d4a-bbce-032fd1d6eca9"
}

request_data = {
        "ad_display_placement": "Banner",
}

response = my_fs.get_feature_vector(join_keys, request_data)
print(response.to_dict())

Fetching Online Features with the HTTP API

The Tecton Python SDK demonstrated above is convenient, but it increases latency on feature vector requests. To maximize performance in production, fetch feature vectors instead using the Tecton HTTP API.

Authenticating HTTP requests requires a Tecton API key. In this example, you first generate an API key, then call the Tecton HTTP API.

Generating an API Key

Generate an API key from your CLI by running the following command:

tecton create-api-key --description "A sample key for the documentation".

Then, export the API key as an environment variable named TECTON_API_KEY or add the key to your secret manager.

Making an HTTP API Call

In a prediction service application, make the HTTP API call from the service's HTTP client. The following example uses cURL as the HTTP client and can be executed from the command line, but the HTTP call is the same for any client.

$ export TECTON_API_KEY='<your_tecton_key>'

$ curl -X POST https://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
     -H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
  "params": {
    "feature_service_name": "ctr_prediction_service",
    "join_key_map": {
      "ad_id": "5417",
      "user_uuid": "6c423390-9a64-52c8-9bb3-bbb108c74198",
    }
  }
}'

Fetching multiple Feature Vectors with a Wildcard Index

You may want to retrieve all feature vectors for a given entity. For example, you may want to retrieve all the ads a user has previously seen for your ad ranking model.

Configuring your wildcard features

First, when defining the feature package, you need to specify the online_serving_index parameter and omit the key you won't use during retrieval.

from datetime import datetime
from tecton import TemporalAggregateFeaturePackage, FeatureAggregation, DataSourceConfig, sql_transformation, MaterializationConfig
from shared import data_sources, entities

@sql_transformation(inputs=data_sources.ad_impressions_batch)
def user_ad_impression_counts_wildcard_transformer(input_df):
    return f"""
        select
            user_uuid,
            ad_id,
            1 as impression,
            timestamp
        from
            {input_df}
        """
user_ad_impression_counts_wildcard = TemporalAggregateFeaturePackage(
    name="user_ad_impression_counts_wildcard",
    entities=[entities.user_entity, entities.ad_entity], # all entities for the TemporalFeaturePackage
    online_serving_index=["user_uuid"], # the join keys of all non-wildcard entities (there can only be one join-key ommitted)
    transformation=user_ad_impression_counts_wildcard_transformer,
    aggregation_slide_period="12h",
    aggregations=[FeatureAggregation(column="impression", function="count", time_windows=["12h", "24h"])],
    materialization=MaterializationConfig(
        online_enabled=True,
        offline_enabled=True,
        feature_start_time=datetime(2021, 2, 1)
    )
)

Now that we've specified our serving indices for the Feature Package, let's create our Feature Service to enable online retrieval.

from tecton import FeatureService, FeaturesConfig
from feature_repo.shared.features.user_ad_impression_counts_wildcard import user_ad_impression_counts_wildcard

ctr_prediction_service = FeatureService(
    name='ctr_prediction_service',
    description='A Feature Service used for supporting a CTR prediction model.',
    online_serving_enabled=True,
    features=[
        user_ad_impression_counts_wildcard
    ],
    family='ad_serving',
    tags={'release': 'production'},
    owner="derek@tecton.ai",
)

Fetching wildcard features online

Once those changes have been applied, we can use the Tecton python library to retrieve a dataframe representing all the features that match our user by omitting the ad_id join key.

import tecton

my_fs = tecton.get_feature_service("ctr_prediction_service")

keys = {
        "user_uuid": "sample-user-uuid"
}

response = my_fs.query_features(keys).to_pandas()
print(response.head())

Alternatively, we can use the HTTP API. See the section above for more detail on how to configure the API key.

$ export TECTON_API_KEY='<your_tecton_key>'

$ curl -X POST https://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
     -H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
  "params": {
    "feature_service_name": "ctr_prediction_service",
    "join_key_map": {
      "user_uuid": "sample-user-id",
    }
  }
}'

Creating training sets with wildcard features

Similarly, we can construct our training dataset by providing a prediction context that contains the join key we specified as our serving index.

import tecton

events = spark.read.parquet("dbfs:/event_data.pq").select("user_uuid", "timestamp")

my_fs = tecton.get_feature_service("ctr_prediction_service")

training_set = fs.get_feature_dataframe(events, timestamp_key="timestamp")

print(training_set.to_pandas().head())