Skip to main content
Version: 0.4

Creating Feature 1

In this topic, you will create and test the first feature, user_credit_card_issuer. This feature determines the user's credit card issuer, based on the user's credit card number.

In your local feature repository, open the file features/batch_features/user_credit_card_issuer.py. In the file, uncomment the following code, which is a definition of the user_credit_card_issuer Feature View.

info

A Feature View defines one or more features, whose values are generated when the Feature View's transformation runs.

The @batch_feature_view decorator (included in the following code) indicates that a Batch Feature View is being defined.

from tecton import batch_feature_view, FilteredSource
from entities import user
from data_sources.customers import customers
from datetime import datetime, timedelta


@batch_feature_view(
sources=[FilteredSource(customers)],
entities=[user],
mode="spark_sql",
online=True,
offline=True,
feature_start_time=datetime(2016, 1, 1),
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
timestamp_field="signup_timestamp",
description="User credit card issuer derived from the user credit card number.",
)
def user_credit_card_issuer(customers):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as user_credit_card_issuer
FROM
{customers}
"""

The Feature View's transformation​

A transformation is logic that runs against data retrieved from one or more external data sources. The user_credit_card_issuer Feature View's transformation is defined in the user_credit_card_issuer function that follows the @batch_feature_view decorator.

info

The name of a Feature View is the name of its transformation function. You refer to a Feature View by name when using the Tecton interactive Python classes to read feature data.

The SELECT statement​

SELECT runs the SQL statement against every record in the table or file in the external data source.

Columns in the SELECT statement​

  • A column for the name of each entity in the Feature View. This Feature View has one entity, user_id. Entities are used as join keys when multiple features are joined together. You will see an example of this in part 2 of the tutorial.
  • The timestamp column. This needed because the Feature View will retrieve historical values from the external data source in order to generate feature values.
  • A column for the name of the each feature in the Feature View. This Feature View has one feature, user_credit_card_issuer.

The columns can be in any order.

The FROM clause​

The FROM clause contains {customers}, which is the data source customers specified in the sources parameter. This parameter contains the names of one or more data sources that the Feature View uses. In this case, there is only one data source. The customers definition (defined earlier in data_sources/customers.py), references the external source-- a file in an S3 bucket-- that the transformation query in the Feature View runs against.

Further information on transformations​

For more information on transformations, see Transformations.

Applying the Feature View​

In your terminal, run tecton apply to apply the code that you uncommented in features/user_credit_card_issuer.py (above) to the workspace that you created and selected in the setup.

Testing the Feature View​

You can test a Feature View in two ways:

  • Test interactively by calling <feature view>.run() with a timestamp range. An example is shown in the next section.
  • Write a unit test, which is a repeatable test that calls <feature view>.run(). For more information, see Unit Testing.
info

You can also test a feature view by calling <feature view>.get_historical_features(), which is more flexible than <feature view>.run(). For more information, see the batch feature view get_historical_features() reference.

Running an interactive test​

Get the feature view from the workspace ws that you defined in the setup.

fv = ws.get_feature_view("user_credit_card_issuer")

Call the run() method of the feature view to get feature data for the timestamp range of 2022-01-01 to 2022-04-10, and display the generated feature values.

(Here, the timestamp range is set arbitrarily. When testing your own Feature Views, set these variables as needed to for the range of times you want to test).

offline_features = fv.run(datetime(2022, 1, 1), datetime(2022, 4, 1)).to_spark().limit(10)
offline_features.show()

Sample output:

user_idsignup_timestampuser_credit_card_issuer
user_4608779617872022-03-09 03:33:09Visa
user_5048316932022-03-12 20:11:22Visa
user_6099047824862022-03-23 13:57:48Visa

Was this page helpful?

Happy React is loading...