Skip to content

Creating a Feature Repository

Overview

The first step to use Tecton is to create a new Feature Repository.

In this example, we will walk through how to create a new Feature Repository or make amendments to existing Feature Repositories. Additionally, we'll show you how to edit objects in the repository.

This overview assumes you have the Tecton CLI installed, and you are authenticated to your account.

Creating a new Feature Repository

Creating a repository for the first time

To initialize a new Feature Repository, you'll first create a directory where the repository will live, which can be anywhere within your git source tree. Then, runΒ tecton initΒ in this directory:

$ tecton init
Feature repository root set to /Users/alice/src/my_feature_repository

Cloning an existing Feature Repository

If you're getting started with Tecton using a previously created Tecton Feature Repository, you will not need to run tecton init. Instead, simply clone the repository via git or otherwise.

$ git clone https://github.com/<<my-org-repo>>>/tecton-feature-repo.git
$ cd tecton-feature-repo

After cloning the repo, you should be able to run tecton plan and other commands.

Making your first change to a Feature Repository

In this example, we'll explain how to create an Entity in a local Feature Repository and then push them to the production version using tecton.

If you haven't already, go back to the above step to create a new repository with tecton init.

Next, we'll create a new Python module where we will define objects in our Feature Repository. Let's call itΒ my_entity.py. Below, we've defined an Entity.

# my_repo/my_entity.py
from tecton import Entity

user = Entity(
    name="user",
    join_keys=["user_id"],
    description="My first entity!"
)

It's important to declare Tecton Objects as global variables in your Python module. When plan or apply commands are run, the Tecton CLI references all Python objects instantiated in theΒ globalΒ scope to identify objects in the Tecton Feature Repository.

Pushing changes with tecton plan and tecton apply

To push the new Entity to the remote Tecton Feature Repository, we return to the command line and run tecton plan to get a preview of what will happen if we apply our changes.

$ tecton plan
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
βœ… Imported 1 Python module from the feature repository
βœ… Collecting local feature declarations
βœ… Performing server-side validation of feature declarations
 ↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  + Create Entity
    name:            user
    description:     My first entity!

 ↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑

If the changes look good, we can run tecton apply, which will generate the same output as tecton plan, along with a final prompt to apply the changes.

$ tecton apply
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
βœ… Imported 1 Python module from the feature repository
βœ… Collecting local feature declarations
βœ… Performing server-side validation of feature declarations
 ↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  + Create Entity
    name:            user
    description:     My first entity!

 ↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]> y
πŸŽ‰ all done!

You've officially pushed your first object to the Tecton repository! If you want to keep going, move on to Creating your first data source.

Otherwise you can continue reading here to learn more about how to edit your repository.

Understanding the plan

When running plan or apply, there are 5 possible types of changes that can take place to modify your remote Feature Repository to reflect your local configuration.

  • + Create: a new object is being created for the first time
  • - Delete: a previously created object is being deleted
  • ~ Update: a non-destructive update to an existing object (e.g. changing the description of a FeatureView)
  • ~ Recreate: an update that requires an object to be recreated in the remote Feature Repository. This is often observed when Transformations are updated or dependencies change between objects. For example, changing a Data Source definition may require any FeatureViews that depend on it to be recreated and re-materialized. Destructive updates can also occur when upstream dependencies are recreated. For example, recreating a FeatureView can also cause recreating any FeatureServices that depend on it.
  • ~ Upgrade: No-op updates of objects to meet the latest Tecton API version. These are sometimes observed after upgrading the Tecton SDK using pip3 and should be considered safe.

Deleting objects

Suppose you wanted to delete the Entity created in the example above. You could simply delete the file and run tecton apply again.

$ tecton apply
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
βœ… Imported 1 Python module from the feature repository
βœ… Collecting local feature declarations
βœ… Performing server-side validation of feature declarations
 ↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  - Delete Entity
    name:            user
    description:     My first entity!

 ↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]> y
πŸŽ‰ all done!

Updating objects

To update the declaration of a Tecton object, edit the object's Python definition, and run tecton apply.

Some changes to objects can be performed in-place, while others may be destructive and modify feature data served by Tecton. This is particularly important for FeatureViews with materialization enabled since some changes will require re-materializing potentially large ranges of data.

For all types of updates, the workflow is identical: Simply use tecton apply to apply your changes to your Feature Repository. The type of update will be indicated in the change plan.

Here are a few examples of simple create, delete, or update changes:

$ tecton apply
Using workspace "prod"
βœ… Imported 3 Python modules from the feature repository
βœ… Collecting local feature declarations
βœ… Performing server-side validation of feature declarations
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
  - Delete Entity
    name:            my_entity
    owner:           alice

  + Create Entity
    name:            my_new_entity
    owner:           alice

  ~ Update FeatureView
    name:            my_feature_view
    owner:           alice
    description:  -> Description of this FeatureView!
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]> y
πŸŽ‰ all done!

Here is an example of a change that results in an action Recreate: my_feature_view is updated, causing it to be recreated along with all objects that depend on it (i.e. my_feature_service).

$ tecton apply
Using workspace "prod"
βœ… Imported 2 Python modules from the feature repository
βœ… Collecting local feature declarations
βœ… Performing server-side validation of feature declarations
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
  ~ Recreate FeatureView
    name:            my_feature_view
    owner:           alice

  ~ Recreate FeatureService
    name:            my_feature_service
    owner:           alice
    DependencyRecreated(FeatureView):  -> my_feature_view
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]> y
πŸŽ‰ all done!

Viewing the apply history for a Workspace

The tecton log command will display a list of previously applied commit hashes for the remote Feature Repository in your Tecton Workspace.

$ tecton log
Using workspace "prod"
commit: 006ad43e0000000000000107
Author: drake
Date:   2020-05-20 23:19:41.829000

commit: 83a205340000000000000105
Author: rihanna
Date:   2020-05-20 18:00:01.858000

commit: 56e8a66a00000000000000fd
Author: jayz
Date:   2020-05-19 15:13:35.083000

commit: 4bfe16ea00000000000000f4
Author: alicakeys
Date:   2020-05-18 18:45:21.232000

Restoring a previous apply

Tecton stores a snapshot of your workspace's local Feature Repository each time tecton apply is run. The tecton restore command makes it possible to overwrite your local Feature Repository with a previous applied version.

To restore the most recently applied version, run tecton restore without a commit version:

$ tecton restore
Using workspace "prod"
This operation may remove or modify the following files:
/Users/drake/Tecton/my-git-repo/feature_repo/entities.py
/Users/drake/Tecton/my-git-repo/feature_repo/data_sources.py
/Users/drake/Tecton/my-git-repo/feature_repo/feature_views.py
Ok? [y/N]>y

To restore previous version of your local Feature Repository, first run tecton log to determine which commit to restore, then run tecton restore <commit>.

$ tecton log
Using workspace "prod"
commit: 006ad43e0000000000000107
Author: jayz
Date:   2020-05-20 23:19:41.829000

commit: **83a205340000000000000105**
Author: rihanna
Date:   2020-05-20 18:00:01.858000

commit: 4bfe16ea00000000000000f4
Author: alicakeys
Date:   2020-05-18 18:45:21.232000

$ tecton restore **83a205340000000000000105**
tecton restore
Using workspace "prod"
This operation may remove or modify the following files:
/Users/drake/Tecton/my-git-repo/feature_repo/entities.py
/Users/drake/Tecton/my-git-repo/feature_repo/data_sources.py
/Users/drake/Tecton/my-git-repo/feature_repo/feature_views.py
Ok? [y/N]>y

(Beta Feature) Skipping files using .tectonignore

Tecton supports a .tectonignore file that can specify files or path expressions to ignore when plan or apply are run. It's similar to Git's .gitignore configuration. .tectonignore should be declared in the feature repo root directory.

For example, in the following repo all objects declared in transactions_batch.py, entities.py, fraud_detection.py, and fraudulent_transactions_count.py would be processed.

β”œβ”€β”€ data_sources
β”‚Β Β  └── transactions_batch.py
β”œβ”€β”€ entities.py
β”œβ”€β”€ feature_services
β”‚Β Β  └── fraud_detection.py
└── features
    └── fraudulent_transactions_count.py

Suppose everything under feature_services/ need to be ignored temporarily. Adding .tectonignore to the repo root with the following glob expression will ignore fraud_detection.py altogether.

# Ignore everything under feature_services/
feature_services/*.py

# Alternatively, include nested directories under feature_services/
feature_services/**/*.py

# Alternatively, ignore a specific file
feature_services/fraud_detection.py

According the tecton plan, the repo contain the following files:

β”œβ”€β”€ .tectonignore
β”œβ”€β”€ data_sources
β”‚Β Β  └── transactions_batch.py
β”œβ”€β”€ entities.py
└── features
    └── fraudulent_transactions_count.py