Managing a Feature Repo
Overview
In this example, we will walk through how to create a new Feature Repository or make amendments to existing Feature Repositories.
This overview assumes you are familiar with the Tecton CLI and have it installed. To review how the Tecton components work together (including the Feature Repository), please review Interacting with Tecton Tools.
Creating a new Feature Repository
Creating a repository for the first time
To initialize a new Feature Repository, you'll first create a directory where the repository will live, which can be anywhere within your git source tree. Then, runΒ tecton init
Β in this directory:
$ tecton init
Feature repository root set to /Users/alice/src/my_feature_repository
Cloning an existing Feature Repository
If you're getting started with Tecton using a previously created Tecton Feature Repository, you will not need to run tecton init
. Instead, simply clone the repository via git or otherwise.
$ git clone https://github.com/<<my-org-repo>>>/tecton-feature-repo.git
$ cd tecton-feature-repo
After cloning the repo, you should be able to run tecton plan
and other commands.
Making changes to a Feature Repository
Example: Adding a new Entity and VirtualDataSource
In this example, we'll explain how to create an Entity and VirtualDataSource in a local Feature Repository and then push them to the production version using tecton
.
If this is a new Feature Repository, the first step is to create a new directory and run tecton init
.
$ mkdir my_repo
$ cd my_repo
$ tecton init
Feature repository root set to /Users/alice/my_repo
Next, we'll create a new Python module where we will define objects in our Feature Repository. Let's call itΒ my_entity.py
. Below, we've defined an Entity.
# my_repo/my_entity.py
from tecton import Entity
my_entity = Entity(
"my_entity",
join_keys=["user_id"],
description="My first entity!"
)
It's important to declare Tecton primitives as global variables in your Python module. When plan
or apply
commands are run, the Tecton CLI references all Python objects instantiated in theΒ global
Β scope to identify objects in the Tecton Feature Repository.
Similarly, we can define a VirtualDataSource in my_data_source.py
:
# my_repo/my_data_source.py
from tecton import HiveDSConfig, VirtualDataSource
batch_ds = HiveDSConfig(
table="my_hive_table",
database="my_hive_database",
date_partition_column="date",
)
my_vds = VirtualDataSource(name="my_vds", batch_ds_config=batch_ds
Pushing changes with tecton plan
and tecton apply
To push the new Entity and VirtualDataSource to the remote Tecton Feature Repository, we return to the command line and run tecton plan
to get a preview of what will happen if we apply our changes.
$ tecton plan
Using workspace "prod"
β
Imported 2 Python modules from the feature repository
β
Collecting local feature declarations
β
Performing server-side validation of feature declarations
ββββββββββββ Plan Start ββββββββββ
+ Create VirtualDataSource
name: my_vds
owner: alice
+ Create Entity
name: my_entity
owner: alice
description: My first entity!
ββββββββββββ Plan End ββββββββββββ
If the changes look good, we can run tecton apply
, which will generate the same output as tecton plan
, along with a final prompt to apply the changes.
$ tecton apply
Using workspace "prod"
β
Imported 2 Python modules from the feature repository
β
Collecting local feature declarations
β
Performing server-side validation of feature declarations
ββββββββββββ Plan Start ββββββββββ
+ Create VirtualDataSource
name: my_vds
owner: alice
+ Create Entity
name: my_entity
owner: alice
description: My first entity!
ββββββββββββ Plan End ββββββββββββ
Are you sure you want to apply this plan? [y/N]> y
π all done!
Understanding the plan
When running plan
or apply
, there are 5 possible types of changes that can take place to modify your remote Feature Repository to reflect your local configuration.
+ Create
: a new object is being created for the first time ****- Delete
: a previously created object is being deleted~ Update
: a non-destructive update to an existing object (e.g. changing thedescription
of a FeaturePackage)~ Recreate
: an update that requires an object to be recreated in the remote Feature Repository. This is often observed when Transformations are updated or dependencies change between objects. For example, changing a VirtualDataSource definition may require any FeaturePackages that depend on it to be recreated and re-materialized. Destructive updates can also occur when upstream dependencies are recreated. For example, recreating a FeaturePackage can also cause recreating any FeatureServices that depend on it.~ Upgrade
: No-op updates of objects to meet the latest Tecton API version. These are sometimes observed after upgrading the Tecton SDK usingpip3
and should be considered safe.
Deleting objects
Suppose you wanted to delete the Entity created in the example above. You could simply delete the file and run tecton apply
again.
$ tecton apply
Using workspace "prod"
β
Imported 2 Python modules from the feature repository
β
Collecting local feature declarations
β
Performing server-side validation of feature declarations
ββββββββββββ Plan Start ββββββββββ
- Delete Entity
name: my_entity
owner: alice
description: My first entity!
ββββββββββββ Plan End ββββββββββββ
Are you sure you want to apply this plan? [y/N]> y
π all done!
Dependencies between objects
Tecton objects often require references to other Tecton objects. For example, a FeaturePackage can reference an Entity.
# my_repo/entities.py
from tecton import Entity, TemporalFeaturePackage
my_entity = Entity(
"my_entity",
join_keys=["user_id"],
description="My first entity!"
)
# my_repo/feature_packages.py
from tecton import TemporalFeaturePackage
from .entities import my_entity
fp = TemporalFeaturePackage(
entities=[my_entity], # Correct!**
name="my_fp",
transformation=transformation,
)
fp = TemporalFeaturePackage(
entities=["my_entity"], # Wrong!**
name="my_fp",
transformation=transformation,
)
Updating objects
To update the declaration of a Tecton object, edit the object's Python definition, and runΒ tecton apply
.
Some changes to objects can be performed in-place, while others may be destructive and modify feature data served by Tecton. This is particularly important for FeaturePackages with materialization enabled since some changes will require re-materializing potentially large ranges of data.
For all types of updates, the workflow is identical: Simply useΒ tecton apply
Β to apply your changes to your Feature Repository. The type of update will be indicated in the change plan.
Here are a few examples of simple create, delete, or update changes:
$ tecton apply
Using workspace "prod"
β
Imported 3 Python modules from the feature repository
β
Collecting local feature declarations
β
Performing server-side validation of feature declarations
ββββββββββββ Plan Start ββββββββββ
- Delete Entity
name: my_entity
owner: alice
+ Create Entity
name: my_new_entity
owner: alice
~ Update FeaturePackage
name: my_feature_package
owner: alice
description: -> Description of this FeaturePackage!
ββββββββββββ Plan End ββββββββββββ
Are you sure you want to apply this plan? [y/N]> y
π all done!
Here is an example of a change that results in an action Recreate: my_feature_package
is updated, causing it to be recreated along with all objects that depend on it (i.e. my_feature_service
).
$ tecton apply
Using workspace "prod"
β
Imported 2 Python modules from the feature repository
β
Collecting local feature declarations
β
Performing server-side validation of feature declarations
ββββββββββββ Plan Start ββββββββββ
~ Recreate FeaturePackage
name: my_feature_package
owner: alice
~ Recreate FeatureService
name: my_feature_service
owner: alice
DependencyRecreated(FeaturePackage): -> my_feature_package
ββββββββββββ Plan End ββββββββββββ
Are you sure you want to apply this plan? [y/N]> y
π all done!
Viewing the apply
history for a Workspace
The tecton log
command will display a list of previously applied commit hashes for the remote Feature Repository in your Tecton Workspace.
$ tecton log
Using workspace "prod"
commit: 006ad43e0000000000000107
Author: drake
Date: 2020-05-20 23:19:41.829000
commit: 83a205340000000000000105
Author: rihanna
Date: 2020-05-20 18:00:01.858000
commit: 56e8a66a00000000000000fd
Author: jayz
Date: 2020-05-19 15:13:35.083000
commit: 4bfe16ea00000000000000f4
Author: alicakeys
Date: 2020-05-18 18:45:21.232000
Restoring a previous apply
Tecton stores a snapshot of your workspace's local Feature Repository each time tecton apply
is run. The tecton restore
command makes it possible to overwrite your local Feature Repository with a previous applied version.
To restore the most recently applied version, run tecton restore
without a commit version:
$ tecton restore
Using workspace "prod"
This operation may remove or modify the following files:
/Users/drake/Tecton/my-git-repo/feature_repo/entities.py
/Users/drake/Tecton/my-git-repo/feature_repo/data_sources.py
/Users/drake/Tecton/my-git-repo/feature_repo/feature_packages.py
Ok? [y/N]>y
To restore previous version of your local Feature Repository, first run tecton log
to determine which commit to restore, then run tecton restore <commit>
.
$ tecton log
Using workspace "prod"
commit: 006ad43e0000000000000107
Author: drake
Date: 2020-05-20 23:19:41.829000
commit: **83a205340000000000000105**
Author: rihanna
Date: 2020-05-20 18:00:01.858000
commit: 56e8a66a00000000000000fd
Author: jayz
Date: 2020-05-19 15:13:35.083000
commit: 4bfe16ea00000000000000f4
Author: alicakeys
Date: 2020-05-18 18:45:21.232000
$ tecton restore **83a205340000000000000105**
tecton restore
Using workspace "prod"
This operation may remove or modify the following files:
/Users/drake/Tecton/my-git-repo/feature_repo/entities.py
/Users/drake/Tecton/my-git-repo/feature_repo/data_sources.py
/Users/drake/Tecton/my-git-repo/feature_repo/feature_packages.py
Ok? [y/N]>y