Pip dependencies and Python environments
This feature is currently in Public Preview.
Build more powerful Realtime Features by leveraging popular Python packages
available in Python Environments. Here's an example Realtime Feature View that
uses the fuzzywuzzy
package to get the fuzzy similarity between two strings:
from tecton import realtime_feature_view, RequestSource, Attribute
from tecton.types import Field, Int64, String
request_schema = [Field("baseline", String), Field("text", String)]
similarity_request = RequestSource(schema=request_schema)
@realtime_feature_view(
sources=[similarity_request],
mode="python",
features=[Attribute("similarity", Int64), Attribute("partial_similarity", Int64)],
environments=["tecton-python-extended:0.1"],
required_packages=["fuzzywuzzy>=0.18.0"],
)
def fuzzy_similarity_feature_view(request):
from fuzzywuzzy import fuzz
result = {
"similarity": fuzz.ratio(request["baseline"], request["text"]),
"partial_similarity": fuzz.partial_ratio(request["baseline"], request["text"]),
}
return result
The new required_packages
parameter allows you to explicitly define the Python
packages and their constraints that your Feature View depends on. This helps
ensure compatibility and proper dependency resolution when running
transformations in Tecton-managed environments.
Note: Only one of environments
and required_packages
can be set for a
Realtime Feature View. If both are specified, an error will be raised.
Python Environments for Realtime Feature Views are isolated compute environments where transformations are run during Online feature retrieval. Specifying an environment enables the use of common Python libraries when building real-time features.
Available Python Environmentsβ
Tecton publishes a set of Python Environments that include common feature transformation packages.
Python Environments are identified by a name and a version number, such as
tecton-python-core:0.1
. By pinning your environment, you can be sure that your
transformation logic will continue to run reliably.
The following Python Environments are available for use:
tecton-python-core
is a lightweight environment with the minimal set of dependencies availabletecton-python-extended
offers a larger set of common feature transformation packages.
The table below lists all available versions for these environments.
Environment | Date published |
---|---|
tecton-python-core:0.1 | 2023-07-26 |
tecton-python-extended:0.5 | 2024-02-20 |
To view this list from the Tecton CLI, run tecton environment list-all
.
The following environments have been deprecated and are no longer available for most customers. If you are using one of these environments, please migrate to a supported environment.
Environment | Date published | Date deprecated |
---|---|---|
tecton-python-extended:0.1 | 2023-07-26 | 2024-02-20 |
tecton-python-extended:0.2 | 2023-08-02 | 2024-02-20 |
tecton-python-extended:0.3 | 2023-08-29 | 2024-02-20 |
tecton-python-extended:0.4 | 2023-09-27 | 2024-02-20 |
Specifying Environments and Required Packages for Realtime Feature Views and Feature Servicesβ
The environments
parameter and the required_packages
parameter can be used
to declare what dependencies an RTFV requires:
environments
parameter on a Realtime Feature View specifies the set of Environments that the transformation logic is compatible with.required_packages
parameter specifies the Python package dependencies, including version constraints, that must be available in the environment for the Realtime Feature View to execute.- The
on_demand_environment
on the Feature Service specifies the single environment that will be used when running all Realtime Feature Views in that Feature Service during Online retrieval.
Specifying Required Packages for Realtime Feature Viewsβ
When using the required_packages
parameter, you can specify the exact Python
dependencies that must be available in the environment for the Realtime Feature
View to execute.
The list of package constraints specified in a realtime feature view is not expected to be the full list of dependencies for the environment. Tecton only validates that the specified packages are available in the environment - additional packages are likely to be present also.
For example, if you specify required_packages=["fuzzywuzzy>=0.18.0"]
, Tecton
will ensure that the fuzzywuzzy
package is available in the environment and
that the version is at least 0.18.0
. The environment may also contain other
packages that are not specified in the required_packages
list such as numpy
,
pandas
, etc.
Package constraints can be specified in the following ways:
package_name
- Any version of the package is acceptablepackage_name==1.0.0
- The package must be version1.0.0
package_name>=1.0.0
- The package must be version1.0.0
or higherpackage_name<=1.0.0
- The package must be version1.0.0
or lowerpackage_name>1.0.0
- The package must be higher than version1.0.0
package_name<1.0.0
- The package must be lower than version1.0.0
package_name~=1.0.0
- The package must be a compatible release of version1.0.0
Additionally, we don't support checking the presence of package extras or conditional dependencies.
Note: Only one of environments
and required_packages
can be set for a
Realtime Feature View. If both are specified, an error will be raised.
Letβs look at an example. Say we want to create:
- A Feature View with a dependency on
fuzzywuzzy
, which is only available intecton-python-extended:0.1
- A Feature View with a dependency on
numpy
, which is available in bothtecton-python-core:0.1
andtecton-python-extended:0.1
. - A Feature Service that contains both of these Feature Views
from tecton import realtime_feature_view, RequestSource, FeatureService, Attribute
from tecton.types import Field, Int64, String
request_schema = [Field("baseline", String), Field("text", String)]
similarity_request = RequestSource(schema=request_schema)
@realtime_feature_view(
sources=[similarity_request],
mode="python",
features=[Attribute("similarity", Int64), Attribute("partial_similarity", Int64)],
environments=["tecton-python-extended:0.1"],
)
def fuzzy_similarity_feature_view(request):
from fuzzywuzzy import fuzz
result = {
"similarity": fuzz.ratio(request["baseline"], request["text"]),
"partial_similarity": fuzz.partial_ratio(request["baseline"], request["text"]),
}
return result
letter_count_request = RequestSource(schema=request_schema)
@realtime_feature_view(
sources=[letter_count_request],
mode="python",
features=[Attribute("letter_count", Int64)],
required_packages=["numpy>=1.21.0"],
)
def letter_count_feature_view(request):
import numpy as np
characters = np.array(list(request["text"]))
letter_count = np.sum(np.char.isalpha(characters))
result = {"letter_count": letter_count}
return result
my_fs = FeatureService(
name="text_processing_feature_service",
features=[fuzzy_similarity_feature_view, letter_count_feature_view],
on_demand_environment="tecton-python-extended:0.1",
)
Note that:
- If
environments
is not specified for a Realtime Feature View, it is assumed to be compatible with all Tecton environments. - If
required_packages
is not specified, no additional package dependencies are installed. - During execution, all Realtime Feature Views within a Feature Service run in
the same Environment. As a result, the
on_demand_environment
specified in the Feature Service must be on theenvironments
list for all Realtime Feature Views included in thefeatures
list. - Configuring an
on_demand_environment
andrequired_packages
can have an impact onget-features
latency. See the section below.
Configuring Notebook and Testing Environments to Match Package Requirementsβ
The Environment configurations above are managed by Tecton and used only during the online execution of Realtime Feature Views. To develop and test these Feature Views offline, ensure that the relevant dependencies are installed in your local environments.
Installing dependencies in your Notebook environmentβ
- Databricks
- EMR
Install individual packages in your notebook with pip install
. Alternatively,
copy the full set of dependencies for the relevant version into a
requirements.txt
file to install all the dependencies at once.
To install individual packages, see the documentation for installing PyPI packages in EMR notebooks.
Installing dependencies in your Unit Testing environmentβ
To run unit tests for your Realtime Feature Views with specific Python dependencies, ensure that the local Python environment executing the tests has the proper dependency versions installed.
Impact of Using Environments and Required Packages on Online Feature Retrieval Latencyβ
The total latency observed is highly dependent on the complexity of the Realtime
Feature View transformation. For example, if the transformation contains
sleep(1)
, then it will take at least 1 second to run.
Configuring the on_demand_environment
and required_packages
for a Feature
Service adds some overhead to each request, in addition to the time it takes to
execute the transformation when calling that Feature Service with the
get-features
API.
Executing transformations in an environment typically adds 20-50ms on top of the transformation time. This latency will be higher if there is a sudden spike in traffic, as the service scales to match the new load.
If the Realtime Feature View includes another Feature View as a source, then it must wait for the upstream Feature View to return before executing, making the latency additive. Otherwise, the Realtime Feature View will be executed in parallel with other Feature Views in the Feature Service.
To inspect the impact of your Realtime Feature Views on the total latency of
your get-features
request, you can compare the serverTimeSeconds
and
sloServerTimeSeconds
values in the metadataOptions
response object. The
serverTimeSeconds
value represents the entire time it took for Tecton to
fulfill the request, while the sloServerTimeSeconds
measurement removes time
spent on Realtime Feature View execution.