Skip to main content
Version: 0.6

Migrating Offline Store from DBFS to S3

This guide applies to Databricks deployments using the Tecton-managed serving plane. It contains instructions for migrating an existing offline store mounted on DBFS to a self-managed S3 bucket.

Why should you migrate your offline store?

Databricks does not recommend storing production data in DBFS root, so Tecton requires using a self-managed S3 bucket when moving to production.

Create Your S3 Bucket

Tecton needs access to the S3 bucket for ensuring data pipeline integrity, cleaning up unused feature data, and to store some forms of logging of Tecton services in the control plane. You will give a role in the Tecton control plane account access to your bucket. Access to this role for these purposes is restricted to automated processes in the Tecton account.

locals {
# Tecton Support will provide the appropriate values for account-id and deployment name
TECTON_CONTROL_PLANE_ACCOUNT_ID = "1234..."
DEPLOYMENT_NAME = "abc-prod"
}

resource "aws_s3_bucket" "tecton" {
bucket = "tecton-${locals.DEPLOYMENT_NAME}"
acl = "private"
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
lifecycle {
ignore_changes = [lifecycle_rule]
}
}

resource "aws_s3_bucket_policy" "read-write-access" {
bucket = aws_s3_bucket.tecton.bucket
policy = data.aws_iam_policy_document.bucket_policy.json
}

data "aws_iam_policy_document" "bucket_policy" {
version = "2012-10-17"
statement {
sid = "AllowReadWrite"
effect = "Allow"
principals {
identifiers = [
"arn:aws:iam::${locals.TECTON_CONTROL_PLANE_ACCOUNT_ID}:role/${locals.DEPLOYMENT_NAME}-worker-node"
]
type = "AWS"
}
actions = [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject",
"s3:DeleteObject",
]
resources = [
aws_s3_bucket.tecton.arn,
"${aws_s3_bucket.tecton.arn}/*",
]
}
}

resource "aws_s3_bucket_ownership_controls" "tecton-bucket-owner" {
bucket = aws_s3_bucket.tecton.id

rule {
object_ownership = "BucketOwnerEnforced"
}
}

Migrate to your new offline store

Option 1: Migrate to your new offline store without migrating your feature data

You should choose this option if you do not have much feature data to migrate or if your production data sources are different from your test data sources.

  1. Open a Tecton Support ticket and provide your S3 bucket name. Tecton will reconfigure new writes to go to the S3 bucket.
  2. Tecton Support will optionally help you re-run any existing feature pipelines to write to the new location.

Option 2: Migrate to your new offline store and migrate your feature data

You should choose this option if your test feature data is the same as your production feature data, and you've already run many expensive compute jobs.

  1. Pause offline materialization on all of your feature views by following these two steps:

    1. Set the offline=False parameter
    2. Run tecton apply
  2. Open a Tecton Support ticket and provide your S3 bucket name. Tecton will reconfigure new writes to go to the S3 bucket

  3. Copy your offline store data from DBFS to the new S3 path. For example, n your Databricks notebook run:

      ```shell
    aws s3 cp --recursive /dbfs/{deployment_name}/offline-store s3://{s3_bucket_name}/offline-store
    ```
  4. Re-enable offline materialization for all of your feature views.