Realtime Feature Views and Struct Types
This feature is not supported in Tecton on Snowflake.
If you are interested in this functionality, please file a feature request.
Realtime Feature Views that Consume Struct Types
A Realtime Feature View (RTFV) can depend on sources that output a Struct
data
type e.g. BatchFeatureView
, RequestSource
. There are a few limitations when
RTFVs depend on sources with Struct
types to keep in mind.
On all Computes​
- Pandas mode RTFVs cannot have a
RequestSource
with aStruct
type as a source.
On Spark Compute​
Struct types are immutable in offline queries in python mode​
In most cases, Tecton feature view definitions are reusable in offline and
online queries. However, there is an exception in python mode RTFVs that depend
on a source with a Struct
when the offline compute is Spark.
When executing a python mode RTFV offline on Spark, the RTFV's transform
function is executed as a python UDF. PySpark passes the source's Struct
to
the transform as a pyspark.sql.Row
object, which is immutable. In online
queries, however, Tecton passes the source's Struct
to the transform as a
dict
, which is mutable.
This means if you are trying to mutate a source's Struct
in the RTFV
transform, your offline queries will produce an error like the following:
Running the transformation resulted in the following error: TypeError: 'Row' object does not support item assignment
You can account for the Row
object immutability by adjusting your transform
function to convert Row
objects to dict
using Row.asDict()
before you
mutate them. This will allow your RTFV to succeed Online and Offline as
expected.
request_source = RequestSource(
[
Field(
"struct_field",
Struct(
[
Field("string_field", String),
]
),
),
]
)
@realtime_feature_view(
mode="python",
sources=[request_source],
features=[Attribute("struct_attribute", Struct([Field("string_field", String)]))],
)
def my_rtfv(request):
from pyspark.sql import Row
with_spark = isinstance(request["struct_field"], Row)
struct_field = request["struct_field"].asDict(recursive=True) if with_spark else request["struct_field"]
struct_field["string_field"] += "_some_suffix"
return {"struct_attribute": struct_field}
Realtime Feature Views that Return Struct Features
You can include a Struct
data type in the output schema of a Realtime Feature
View (RTFV). A Struct
can contain multiple fields with mixed data types.
A Struct
can be nested within other complex types. For example, you can have a
Struct
within a Struct
, or an array of Struct
s.
Using a Struct
in the output schema of an RTFV allows you to easily parse the
RTFV's output when it contains multiple feature values.
Example usage: An output Struct
containing two fields​
The RTFV definition​
from tecton import realtime_feature_view, RequestSource, FeatureService, Attribute
from tecton.types import Array, Field, Float64, String, Struct
request_source = RequestSource([Field("input_float", Float64)])
output_schema = (Struct([Field("string_field", String), Field("float64_field", Float64)]),)
@realtime_feature_view(
mode="python",
sources=[request_source],
features=[Attribute("output_struct", output_schema)],
description="Output a struct with two fields.",
)
def simple_struct_example_rtfv(request):
input_float = request["input_float"]
return {
"output_struct": {
"string_field": str(input_float * 2),
"float64_field": input_float * 2,
}
}
feature_service = FeatureService(
name="simple_struct_example_feature_service",
description="Output a struct with two fields.",
features=[simple_struct_example_rtfv],
)
Example usage in a notebook​
import tecton
import pandas
events = pandas.DataFrame(data={"input_float": [1.23, 3.22]})
simple_struct_example_rtfv = tecton.get_workspace("my_workspace").get_feature_view("simple_struct_example_rtfv")
simple_struct_example_rtfv.get_features_for_events(events).to_spark().show(10, False)
Output:
+-----------+-----------------------------------------+
|input_float|simple_struct_example_rtfv__output_struct|
+-----------+-----------------------------------------+
|1.23 |{2.46, 2.46} |
|3.22 |{6.44, 6.44} |
+-----------+-----------------------------------------+
Example HTTP request​
$ curl -X POST http://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"params": {
"workspace_name": "my_workspace",
"feature_service_name": "simple_struct_example_feature_service",
"request_context_map": {
"input_float": 1.23
},
"metadata_options": {
"include_names": true,
"include_data_types": true
}
}
}'
Output:
{
"result": {
"features": [["2.46", 2.46]]
},
"metadata": {
"features": [
{
"name": "output_struct",
"dataType": {
"type": "struct",
"fields": [
{
"name": "string_field",
"dataType": {
"type": "string"
}
},
{
"name": "float64_field",
"dataType": {
"type": "float64"
}
}
]
}
}
]
}
}
Example usage: An output Struct
containing an array of Struct
s with some nulls​
The RTFV definition​
from tecton import realtime_feature_view, RequestSource, FeatureService
from tecton.types import Array, Field, Float64, String, Struct
request_source = RequestSource([Field("input_float", Float64)])
array_of_structs_schema = (Array(Struct([Field("string_field", String), Field("float64_field", Float64)])),)
@realtime_feature_view(
mode="python",
sources=[request_source],
features=[Attribute("array_of_structs", array_of_structs_schema)],
description="Output an array of Structs with some null examples.",
)
def array_of_structs_example_rtfv(request):
input_float = request["input_float"]
return {
"array_of_structs": [
{"string_field": str(input_float * 2), "float64_field": input_float * 2},
{"string_field": str(input_float * 3), "float64_field": input_float * 3},
# A Struct missing one key and setting the other explicitly to None. These are equivalent
# was to return a "null" field.
{
"string_field": None,
# "float64_field": ...
},
# All Tecton data types are nullable, including Structs.
None,
]
}
feature_service = FeatureService(
name="simple_struct_example_feature_service",
description="Output an array of structs.",
features=[array_of_structs_example_rtfv],
)
Example usage in a notebook​
array_of_structs_example_rtfv = tecton.get_workspace("my_workspace").get_feature_view("array_of_structs_example_rtfv")
array_of_structs_example_rtfv.get_features_for_events(events).to_spark().show(10, False)
Output:
+-----------+------------------------------------------------+
|input_float|array_of_structs_example_rtfv__array_of_structs |
+-----------+------------------------------------------------+
|1.23 |[{2.46, 2.46}, {3.69, 3.69}, {null, null}, null]|
|3.22 |[{6.44, 6.44}, {9.66, 9.66}, {null, null}, null]|
+-----------+------------------------------------------------+
Example HTTP request​
$ curl -X POST http://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"params": {
"workspace_name": "my_workspace",
"feature_service_name": "simple_struct_example_feature_service",
"request_context_map": {
"input_float": 1.23
},
"metadata_options": {
"include_names": true,
"include_data_types": true
}
}
}'
Output:
{
"result": {
"features": [[["2.46", 2.46], ["3.69", 3.69], [null, null], null]]
},
"metadata": {
"features": [
{
"name": "array_of_structs",
"dataType": {
"type": "array",
"elementType": {
"type": "struct",
"fields": [
{
"name": "string_field",
"dataType": {
"type": "string"
}
},
{
"name": "float64_field",
"dataType": {
"type": "float64"
}
}
]
}
}
}
]
}
}
Note that
null
or missing fields are returned in the JSON response as JSONnull
, and that there is a difference between aStruct
containing all null values and a nullStruct
. Both are shown in this example.