Skip to main content
Version: 0.8

Feature Tables

info

Available for Rift & Spark.

If you are interested in this functionality, please file a feature request.

A Feature Table allows you to ingest features into Tecton that you've already transformed outside of Tecton (say in your data lake or data warehouse). In contrast to Feature Views, you are responsible for transforming raw data into feature values and ingesting those feature values into Tecton via its API.

Use a FeatureTable if:

  • you already have feature data pipelines running outside of Tecton and you want to make those feature values available for consistent offline and online consumption
  • you need to run a feature transformation that's not supported by Tecton's FeatureViews. A FeatureTable provides you with a flexible escape hatch to bring arbitrary features into Tecton

Common Examples:

  • You manage a pipeline outside of Tecton that generates user embeddings and you want to make those available for online and/or offline serving
  • You're just getting started with Tecton and already run Airflow pipelines that produce batch features. Now you want to bring them to Tecton for online and/or offline serving

Within a single FeatureService, you can include a FeatureTable alongside a FeatureView. This capability provides an easy way for you to use Tecton to develop new features, while continuing to leverage your existing feature pipelines.

from tecton import Entity, FeatureTable
from tecton.types import String, Timestamp, Int64, Field
from fraud.entities import user
from datetime import timedelta


schema = [
Field("user_id", String),
Field("timestamp", Timestamp),
Field("user_login_count_7d", Int64),
Field("user_login_count_30d", Int64),
]

user_login_counts = FeatureTable(
name="user_login_counts",
entities=[user],
schema=schema,
online=True,
offline=True,
ttl=timedelta(days=7),
description="User login counts over time.",
)

Ingest Data into the Feature Table​

Once the FeatureTable has been added to your feature repository, you can use the Tecton Python SDK to push feature data into Tecton.

To do so, you'll simply pass a Spark or Pandas dataframe to the FeatureTable.ingest() method within your Spark environment. This dataframe must contain all the columns that were declared in the schema.

Use your Databricks or EMR notebook to ingest a simple dataframe to the FeatureTable defined above.

import pandas
import tecton
from datetime import datetime, timedelta

df = pandas.DataFrame(
[
{
"user_id": "user_1",
"timestamp": pandas.Timestamp(datetime.now()),
"user_login_count_7d": 15,
"user_login_count_30d": 35,
}
]
)

ws = tecton.get_workspace("prod")
ft = ws.get_feature_table("user_login_counts")
ft.ingest(df)

After calling FeatureTable.ingest(), you can track the status of the materialization job in the Web UI or with FeatureTable.materialization_status().

How it works​

To ingest the dataframe, the Tecton SDK will first write the dataframe to an S3 bucket in the Tecton Data Plane. Then Tecton will initiate materialization jobs to write that data into the Online and Offline stores.

If you submit duplicate features for the same join_keys and timestamps, the last write will win.

note

If you want to overwrite existing entity keys you should expect the following behavior:

  • Online Store

    • If the new event has a later timestamp for the entity than the previous record, it will be overwritten. If the new timestamp is not later, then it won’t be written.
  • Offline Store

    • It will always append and perform point-in-time joins based on the timestamp.

Was this page helpful?