Skip to content

Pushing Feature Values into Feature Stores

Overview

Use a PushFeaturePackage to ingest features generated outside of Tecton and load them into your offline and online Feature Stores for training or prediction.

Use Cases

PushFeaturePackages are most suitable for the following use cases:

  1. You want to test the workflow of using Tecton in a light-weight way. Unlike other types of Tecton, you can push features to Tecton from a PushFeaturePackage without setting up data sources or creating feature Transformations.
  2. You have existing external feature processing jobs running in systems like Airflow, and you'd like to make results from those jobs available in Tecton.

Schemas

Tecton supports Pandas and Spark Dataframes as inputs. Since the data is coming from outside of Tecton, you must declare the schema as part of the Push Feature Package definition. For other Feature Package types where Tecton computes feature values, the schema is inferred automatically from Transformation output.

Example

This example shows how to populate a simple feature that counts the total number of purchases made by a customer. It demonstrates how to create a PushFeaturePackage and how to ingest and consume feature values.

The procedure has three parts:

  1. Create the Feature Package and register it to the Feature Store.
  2. Push data values into the Feature Package from a notebook.
  3. Fetch the features using a Feature Service and preview the features.

Create and Register the Feature Package

The code example below creates a PushFeaturePackage object.

The schema field specifies the schema, expressed as a Spark StructType.

In the materialization config, specify whether the data is to be stored for training, for serving, or for both. The Materialization contains other parameters controlling feature storing and serving, described here.

from pyspark.sql.types import StructType, StructField, LongType, IntegerType, StringType, TimestampType
from tecton import Entity, PushFeaturePackage, MaterializationConfig, DeltaConfig

user_entity = Entity(name="user", default_join_keys=["userid"])

schema = StructType()
schema.add(StructField("timestamp", TimestampType()))
schema.add(StructField("userid", StringType()))
schema.add(StructField("num_purchases", LongType()))

fp = PushFeaturePackage(
    name="user_purchases_push_fp",
    entities=[user_entity],
    schema=schema,
    materialization=MaterializationConfig(
        serving_ttl="30d",
        offline_enabled=True,
                offline_config=DeltaConfig(),
        online_enabled=True,
    ),
)

In your local Feature Store directory, type tecton apply to create the Feature Package in the Tecton cluster.

Push Data Values into the Feature Package

Once a Push Feature Package is created, push data values for training or serving from your interactive notebook environment.

The following code creates a simple single-row Dataframe and pushes it into the new Feature Package. The Dataframe must contain all the columns that were declared in the schema, with the right data types. Extra columns are ignored.

The ingest call may take few seconds or longer, depending on the amount of data.

import tecton
import pandas

fp = tecton.get_feature_package('user_purchases_push_fp')

pandas_df = pandas.DataFrame([{
    "timestamp": pandas.Timestamp("2020-09-18 12:00:06", tz="UTC"),
    "userid": "u123",
    "num_purchases": 91
}])

fp.ingest(pandas_df)

It might take a few seconds for the data to become available in the Online and the Offline Feature Stores.

To verify that the value was ingested properly, preview the data:

fp.preview()

Fetch Values via a Feature Service

A Push Feature Package can be added to a Feature Service just like any other Feature Package type. Create a simple Feature service by adding the following definition to your Feature Store Configuration. You can also add the Push Feature Package to an existing Feature Service alongside other Feature Packages of any type.

fs = FeatureService(
    name="user_purchases_push_fs",
    features=[fp],
    online_serving_enabled=True,
)

Register the configuration by again typing tecton apply.

Test the new Feature Service in the Notebook by fetching a feature vector:

import tecton

fs = tecton.get_feature_service('user_purchases_push_fs')

fs.get_feature_vector(join_keys={'userid': 'u123'}).to_pandas()

For more examples on fetching real-time values refer to Fetching Online Features.