Skip to main content
Version: 1.1

BigQueryConfig

Summary​

Configuration used to reference a BigQuery table or query.

The BigQueryConfig class is used to create a reference to a BigQuery table. You can also create a reference to a query on one or more tables, which will be registered in Tecton in a similar way as a view is registered in other data systems.

This class is used as an input to a BatchSource's parameter batch_config. Declaring this configuration class alone will not register a Data Source. Instead, declare as a part of BatchSource that takes this configuration class instance as a parameter.

Attributes​

NameData TypeDescription
data_delayHow long they wait after the end of the batch schedule period before starting, typically to ensure that all data has landed.

Methods​

NameDescription
__init__(...)Instantiates a new BigQueryConfig. One of table and query should be specified when creating this file.

__init__(...)​

Instantiates a new BigQueryConfig. One of table and query should be specified when creating this file.

Parameters

  • project_id: Optional[str] = None The BigQuery Project ID for this Data source.
  • dataset: Optional[str] = None The BigQuery Dataset for this Data source.
  • location: Optional[str] = None Optional geographic location of the dataset, such as "US" or "EU". This is for ensuring that queries are run in the same location as the data.
  • table: Optional[str] = None The table for this Data source. Only one of table and query must be specified.
  • query: Optional[str] = None The query for this Data source. Only one of table and query must be specified.
  • timestamp_field: Optional[str] = None The timestamp column in this data source that should be used for time-based filtering. Required unless this source is used in Feature Views only with unfiltered().
  • data_delay: timedelta = 0:00:00 This parameter configures how long jobs wait after the end of the batch_schedule period before starting, typically to ensure that all data has landed. For example, if a feature view has a batch_schedule of 1 day and one of the data source inputs has data_delay=timedelta(hours=1) set, then incremental materialization jobs will run at 01:00 UTC.
  • credentials: Optional[Secret] = None Optional service account credentials used to connect to BigQuery.

Returns

A BigQueryConfig class instance.

Example​

from datetime import datetime, timedelta
from tecton import BigQueryConfig, BatchSource, Secret


# Declare BigQueryConfig instance object that can be used as an argument in BatchSource
bq_config = BigQueryConfig(
project_id="bigquery-public-data",
dataset="san_francisco_bikeshare",
location="US",
table="bikeshare_trips",
timestamp_field="start_date",
credentials=Secret(scope="your-secrets-scope", key="your-bq-service-account-key"), # Optional
)

# Use in the BatchSource
ds = BatchSource(name="sf_bike_trips_ds", batch_config=bq_config)
tecton_df = ds.get_dataframe(start_time=datetime(2018, 1, 1) - timedelta(days=1), end_time=datetime(2018, 1, 1))
tecton_df.to_pandas().head(10)

Was this page helpful?