FilterContext
Summary​
FilterContext
is passed as an argument to the data source function when supports_time_filtering
is set to True
. Using these parameters enables optimized query patterns for improved performance:Â
- The method
<data source>.get_dataframe()
can be invoked with the argumentsstart_time
orend_time
. - When defining a Feature View, a FilteredSource can be paired with a Data Source. The Feature View will then pass FilterContext into the Data Source Function
Â
Note that Data Source Functions are expected to implement their own filtering logic.
Example
from tecton import spark_batch_configfrom pyspark.sql.functions import col@spark_batch_config(supports_time_filtering=True)def hive_data_source_function(spark, filter_context):spark.sql(f"USE {hive_db_name}")df = spark.table(user_hive_table)ts_column = "timestamp"# Data Source Function handles its own filtering logic hereif filter_context:if filter_context.start_time:df = df.where(col(ts_column) >= filter_context.start_time)if filter_context.end_time:df = df.where(col(ts_column) < filter_context.end_time)return df
Methods​
__init__(...)​
Parameters
start_time
(Optional
[datetime.datetime
]) - If specified, data source will only include items with timestamp column >= start_timeend_time
(Optional
[datetime.datetime
]) - If specified, data source will only include items with timestamp column < end_time