Creating & Managing Features
Defining feature pipelines​
What programming languages do you support for defining features?​
For each feature in Tecton, users create a Python-based feature definition file that includes all of the metadata and logic they want Tecton to manage.
Tecton's transformation logic is managed in Spark, using PySpark or SQL transformations. If your model requires request-time transformations, those are managed in Python.
See Feature Views for more details.
What data types are supported for feature values?​
Check the Data Types page for details.
Feature materialization and lineage​
How is the materialization window calculated?​
The initial materialization window is determined by the parameters
batch_schedule
or aggregation_interval
, in conjunction with
feature_start_time
. When defining a Feature View, Tecton aligns the
materialization interval with Unix epoch time, where the starting point is the
epoch time of 1970/01/01 (0). We then increment the interval by batch_schedule
until its end date equals or surpasses feature_start_time
.
To illustrate, consider a Feature View with a start time of 2023/11/12 and a
7-day batch schedule. The initial materialization window would be the interval
from 2023-11-02 00:00:00
to 2023-11-09 00:00:00
.
The parameter feature_start_time
dictates the interval from which we start
backfilling. With a feature_start_time
set to 2023/11/12, it implies that you
can retrieve feature values starting from that date. The most recent interval
facilitating this is 2023-11-02 00:00:00
to 2023-11-09 00:00:00
, as the
subsequent interval (2023-11-09 00:00:00
to 2023-11-16 00:00:00
) is too late
for the specified start time.
What happens when the definition of a feature changes?​
If a feature's definition changes, Tecton automatically flags all the dependencies on that feature and confirms if the user wants to proceed with the changes. If you would like to roll back the changes or see the feature lineage, these definitions are backed by git. You can therefore track the latest state of your feature store at all times.
What support do you provide for time travel?​
Tecton performs time travel on a row-level basis - our granularity of time travel can be quite specific. If you have event-driven ML models where you're regularly making predictions and you need to go back to every single specific point in time and get the feature values as of that point in time, Tecton will handle that full travel query as opposed to just being able to get all feature values at a single point in time.
How far back does Tecton support time travel?​
You set your features' backfill start date in Tecton. Time travel can be performed as far back as feature data exists.
Does Tecton provide the functionality to replay and fix a backfill if the underlying data source is updated?​
Yes, it is possible to kick off an "overwrite backfill" for a particular time range through the Tecton UI.
When scheduling materializations, does Tecton only materialize new data? Or does Tecton re-materialize all data?​
Generally speaking, Tecton only reads and computes new data. There may be instances in which more historical data is required (eg, computing a one-month average at materialization time requires knowing the full window of information).
What does Tecton do for data lineage? Does it support the entire data flow?​
For data lineage, we consider both how features are created and how they are consumed. For feature creation, we show you the entire data flow - from your raw data sources to the different transformations being run, to where the data is being stored. For feature consumption, we have a concept of a Feature Service which maps to the features of a model that is running. For any feature, you can see which services are using it and, likewise for any service, what all the features are inside of it - there is bidirectional tracking.
Does Tecton have an Airflow or Prefect integration?​
Tecton has open-sourced an Airflow provider for coordinating orchestration between Tecton and upstream or downstream pipelines.
If you don't use Airflow, you can implement similar functionality with the Tecton SDK.
Sharing Features​
Can users inspect features?​
Both the Tecton SDK and Web UI enable teams to inspect existing features in their feature store. They can review the actual code that produces the feature, see the status of materialization jobs, or query the actual feature data.
Can users register and discover features in Tecton?​
Yes, with Tecton, you register the entire transformation logic, plus metadata around owners, custom tags, and more. The Tecton Web UI then allows users to access, browse, and discover different features.
How can users ensure there are no duplicate features ingested?​
The Tecton Feature Store manages feature dependencies through the names of the objects that are configured for Tecton (eg, data sources, Feature Views, and services). It is possible to have users submit similar features with different names; we would recommend users first look to reuse features that exist in the feature store.
Handling Nulls​
Does Tecton support null feature values?​
Yes. Tecton supports nulls for feature values and for request data fields. Null
values may be returned when data is missing (e.g. for a brand new user), when a
materialized Feature View column computes null (e.g. SELECT NULLIF(a, b)
), or
when a Realtime Feature View returns None
.
Nulls may also be members of arrays, e.g. ["foo", null]
, or members of
structs.
Numeric null inputs in Spark Pandas Realtime Feature Views​
If you expect to use numeric nulls, Python mode (mode="python"
) is strongly
recommended.
Spark offline Pandas-mode (mode="pandas"
) Realtime Feature Views inputs have
special handling for numeric (i.e. Integer or Float) null values; numeric null
inputs are cast to NaN
.
By contrast, when running online (i.e. serving a production HTTP request), in
Python mode, or on Tecton on Snowflake, numeric nulls are provided as None
like all other data types.
See the following example given the request input to this Feature View is
provided as {input_int: null, input_float: null}
:
from tecton import RequestSource, realtime_feature_view, Attribute
from tecton.types import Int64, Float64, Field
request = RequestSource([Field("input_int", Int64), Field("input_float", Float64)])
@realtime_feature_view(
sources=[request],
mode="pandas",
features=[Attribute("output_int", Int64), Attribute("output_float", Float64)],
)
def numeric_null_example(request_df):
import pandas
print(request_df["input_int"][0]) # `nan` in Spark offline. `None` in all other cases.
print(request_df["input_float"][0]) # `nan` in Spark offline. `None` in all other cases.
# `None` values in the output features are correctly handled as `null` in all cases.
return pandas.DataFrame.from_records([{"output_int": None, "output_float": None}])
The ttl (time-to-live) parameter in Feature Views​
The value of ttl
affects the availability of feature data in the online store,
the generation of training feature data, and the deletion of feature values from
the online store.
ttl
is a Batch and Stream Feature View parameter, as well as a Feature Table
parameter.
For more information, refer to this page.
How can users delete feature values from the offline store?​
To delete values from the offline store, use the delete_keys() method.
How can users delete feature definitions from Tecton?​
If you wish to delete a Feature View that's been applied to a workspace, whether or not data has been materialized, simply remove or comment out its definition from your Feature Repository. This will delete the Feature View, rendering its associated data inaccessible and subsequently dropped.