Skip to main content
Version: Beta 🚧

Pin Databricks or EMR Runtimes

Overview​

By default, a Tecton materialization cluster uses a specific EMR release or Databricks Runtime release, which computes your feature values for online serving and training dataframes. Periodically, Tecton upgrades the default EMR release/Databricks Runtime release on materialization clusters, to apply the latest security patches and stability fixes. These upgrades may include Spark upgrades.

Dependencies for related libraries/software (such as Spark/PySpark) are tied to the EMR or Databricks version used by the version of Tecton you are implementing. For example, Tecton 1.0 uses EMR 6.9.1 by default. You can see the corresponding list of supported versions for EMR 6.9.1 in the AWS documentation. Thus, for Tecton 1.0, the supported version of PySpark is 3.3.0.

Similarly, for Databricks version 14.3.x-scala2.12, you can find the list of supported libraries in the corresponding Databricks Release Notes.

Supported Databricks Runtimes:

  • 9.1.x-scala2.12
  • 10.4.x-scala2.12
    • Default for Tecton 0.6, 0.7, and 0.8
  • 11.3.x-scala2.12
    • Default for Tecton 0.9
  • 12.2.x-scala2.12
    • Available in Tecton 0.8.5+
  • 13.3.x-scala2.12
    • Available in Tecton 0.8.2+
  • 14.3.x-scala2.12
    • Available in Tecton 0.9.4+
  • 15.4.x-scala2.12
    • Available in Tecton 0.9.22+ and 1.0.16+

Supported EMR Versions:

  • emr-6.5.0
    • Not supported in Tecton 1.0+
  • emr-6.7.0
    • Default for Tecton 0.6, 0.7, and 0.8
  • emr-6.9.0
  • emr-6.9.1
    • Default for Tecton 0.9, 1.0
  • emr-6.12.0
    • Available in Tecton 0.8.5+
  • emr-7.0.0
    • Available in Tecton 0.8.6+
    • Kinesis-based Stream Feature Views do not currently support EMR 7.0 -- support will be added soon.

Rarely, existing transformation logic defined in Tecton will be incompatible with a Spark upgrade.

To prevent a Spark upgrade (that will occur due to a EMR upgrade/Databricks Runtime upgrade), or to downgrade Spark if an incompatibility has occurred, you can configure Tecton to override the default EMR release/Databricks Runtime release, per Feature View and Feature Table.

Overriding Tecton’s default EMR release/Databricks Runtime release​

In Feature View and Feature Table definitions, you can specify which EMR release/Databricks Runtime release is used, by setting the parameters in the table below to a DatabricksClusterConfig or a EMRClusterConfig object.

ObjectParameter to Set
@batch_feature_viewbatch_config
@stream_feature_viewstream_config
FeatureTablebatch_config

If using a DatabricksClusterConfig object, set the dbr_version parameter. Note: the name must be a valid runtime name.

If using a EMRClusterConfig object, set the emr_version parameter.

Looking at the Spark Release Notes​

We recommend looking at the Spark release notes to see if your Tecton transformations are using any deprecated features, and check if any custom JARs you use need to be updated to be compatible. This page contains links to the release notes for each Spark version.

The links below show the Spark version that is included in each version of Databricks Runtime and EMR, respectively:

Overriding EMR default Python version​

For EMR clusters 6.X, the default Python version is 3.7. As Tecton plans to sunset support for Python 3.7 by SDK version 1.3, we recommend overriding the default Python version to 3.9. You can do this by setting the python_version parameter in the EMRClusterConfig object.

Currently, Tecton supports the following Python versions:

  • Python 3.9.13 (python_3_9_13) - Recommended
  • Cluster Default (default) - Uses the default Python version on the EMR cluster, see here for specific version information.

In the example below, we override the default Python version on EMR 6.12 to Python 3.9.13.

Example:

@batch_feature_view(..., batch_compute=EMRClusterConfig(emr_version="emr-6.12.0", python_version="python_3_9_13"))
def my_bfv():
pass

The default Python version can also be overridden at a repository level using repo.yaml. In the example below, all Batch Feature Views in the associated repo will use Python 3.9.13 on EMR 6.12.

repo.yaml:

defaults:
batch_feature_view:
tecton_materialization_runtime: 1.1.0
batch_compute:
kind: EMRClusterConfig
instance_type: m5.xlarge
number_of_workers: 2
python_version: python_3_9_13
emr_version: emr-6.12.0

Was this page helpful?