Skip to main content
Version: 0.5

Serverless Feature Retrieval from any Python environment

Public Preview

This feature is currently in Public Preview.

This feature has the following limitations:
  • You cannot use the Athena connector to compute features with the `get_historical_features()` method’s `from_source` parameter set to `True`. Features can only be read from the offline store.
  • The Athena connector does not support Boolean type features.
  • The Athena connector is not compatible with Tecton on Snowflake.
  • The Athena connector does not support Array-type features.
Please file a feature request for functionality that you are interested in.

Summary

This guide explains how you can use Tecton together with AWS Athena to retrieve features from Tecton’s offline store in any Python environment that has access to AWS (e.g. your local laptop, a Jupyter notebook, Kubeflow pipelines etc.).

Background

When using the Tecton SDK with EMR or Databricks, the Tecton SDK leverages the existing Spark context to retrieve features from the offline store. As a result, the SDK methods for offline feature retrieval function properly only in interactive Databricks notebooks or interactive AWS EMR notebooks.

As an alternative, you can use the Tecton SDK with Athena to access features from the offline store, removing the requirement of being in a Databricks or EMR Notebook.

How the SDK works with Athena

The following diagram shows how Athena fits into Tecton’s Architecture:

Architecture Diagram of Athena in Tecton

When you retrieve features from the offline store, the SDK executes the following steps:

  1. Tecton registers tables in AWS Glue for Feature Views and Feature Tables that the user attempts to read data from. Those tables point at the Feature View’s Offline Store location on S3.
  2. Tecton’s SDK builds and executes Athena SQL queries to fetch historical feature data from the S3-based Tecton Offline Store.
  3. Athena writes the query result to S3.
  4. Tecton’s SDK reads the result from S3 and returns a pandas DataFrame.
note

Additionally, if you retrieve features using a pandas DataFrame spine, Tecton will upload the spine to S3 and register an Athena table for it.

Supported Tecton SDK operations

Athena is used for the following SDK operations:

  • FeatureView.get_historical_features()
  • FeatureTable.get_historical_features()
  • FeatureService.get_historical_features()

Otherwise, the Tecton SDK uses a Spark context for other operations that read data, such as DataSource.get_dataframe().

Installation

To install in your notebook, run:

pip install 'tecton[athena]'

To enable the Athena connector, run:

import tecton
tecton.conf.set("ALPHA_ATHENA_COMPUTE_ENABLED", "true")

Athena session configuration

The following optional configuration can be set on the Athena Session:

import tecton_athena

session = tecton_athena.athena_session.get_session()
config = session.config

config.boto3_session = ... # Boto3 session
config.workgroup = ... # Athena workgroup.
config.encryption = (
...
) # Valid values: [None, 'SSE_S3', 'SSE_KMS']. Notice: 'CSE_KMS' is not supported.
config.kms_key = ... # For SSE-KMS, this is the KMS key ARN or ID.
config.database = ... # Name of the Database in the Catalog
config.s3_path = (
...
) # S3 Location where spines will be uploaded and Athena output will be stored
config.spine_temp_table_name = (
...
) # Specifies the temp table name in the catalog used for Tecton spines

Notes regarding some of these configuration options:

s3_path: (Defaults to a newly created bucket in the user’s region). S3 Path that Athena writes its results to, and that Tecton uploads spines to. Must start with s3://.

database: (Defaults to default). Name of the AWS Glue catalog Tecton registers Athena tables in (both for spines and for Feature Views).

spine_temp_table_name: (Defaults to the empty string). Prefix for the Glue table the SDK registers when a spine, as required by some get_historical_features calls, is uploaded to S3 and registered in S3. This is useful if multiple users use the SDK at the same time.

Required permissions

The environment in which the SDK is used must have the following permissions to AWS services:

  • S3:
    • Read and write access to the S3 path specified in ATHENA_S3_PATH
    • Read access to the S3 bucket Tecton uses for the Offline Store
  • Glue Catalog:
    • Read and write access to the catalog specified in ATHENA_DATABASE, including
      • glue:CreateTable
      • glue:DeleteTable
  • Athena
    • Full read access, including
      • athena:StartQueryExecution
      • athena:GetQueryExecution

Authentication

The authentication methods that are available for authenticating your environment with AWS are described here.

On-Demand Feature Views

If you use a Feature Service that depends on On-Demand Feature Views, they will be executed locally directly in the SDK’s client environment.