BigQueryConfig
Summary​
Configuration used to reference a BigQuery table or query.
The BigQueryConfig class is used to create a reference to a BigQuery table. You can also create a reference to a query on one or more tables, which will be registered in Tecton in a similar way as a view is registered in other data systems.
This class is used as an input to a
BatchSource's parameter batch_config.
Declaring this configuration class alone will not register a Data Source.
Instead, declare as a part of BatchSource that takes this configuration class
instance as a parameter.
Attributes​
| Name | Data Type | Description |
|---|---|---|
data_delay | How long they wait after the end of the batch schedule period before starting, typically to ensure that all data has landed. |
Methods​
| Name | Description |
|---|---|
__init__(...) | Instantiates a new BigQueryConfig. One of table and query should be specified when creating this file. |
__init__(...)​
Instantiates a new BigQueryConfig. One of table and query should be specified when creating this file.Parameters
project_id: Optional[str] = NoneThe BigQuery Project ID for this Data source.dataset: Optional[str] = NoneThe BigQuery Dataset for this Data source.location: Optional[str] = NoneOptional geographic location of the dataset, such as "US" or "EU". This is for ensuring that queries are run in the same location as the data.table: Optional[str] = NoneThe table for this Data source. Only one oftableandquerymust be specified.query: Optional[str] = NoneThe query for this Data source. Only one oftableandquerymust be specified.timestamp_field: Optional[str] = NoneThe timestamp column in this data source that should be used for time-based filtering. Required unless this source is used in Feature Views only withunfiltered().data_delay: timedelta = 0:00:00This parameter configures how long jobs wait after the end of the batch_schedule period before starting, typically to ensure that all data has landed. For example, if a feature view has abatch_scheduleof 1 day and one of the data source inputs hasdata_delay=timedelta(hours=1)set, then incremental materialization jobs will run at01:00UTC.credentials: Optional[Secret] = NoneOptional service account credentials used to connect to BigQuery.
Returns
A BigQueryConfig class instance.Example​
- Table
- Query
from datetime import datetime, timedelta
from tecton import BigQueryConfig, BatchSource, Secret
# Declare BigQueryConfig instance object that can be used as an argument in BatchSource
bq_config = BigQueryConfig(
project_id="bigquery-public-data",
dataset="san_francisco_bikeshare",
location="US",
table="bikeshare_trips",
timestamp_field="start_date",
credentials=Secret(scope="your-secrets-scope", key="your-bq-service-account-key"), # Optional
)
# Use in the BatchSource
ds = BatchSource(name="sf_bike_trips_ds", batch_config=bq_config)
tecton_df = ds.get_dataframe(start_time=datetime(2018, 1, 1) - timedelta(days=1), end_time=datetime(2018, 1, 1))
tecton_df.to_pandas().head(10)
from datetime import datetime, timedelta
from tecton import BigQueryConfig, BatchSource, Secret
# Declare BigQueryConfig instance object that can be used as an argument in BatchSource
bq_config = BigQueryConfig(
project_id="bigquery-public-data",
dataset="san_francisco_bikeshare",
location="US",
query="SELECT trip_id, duration_sec, start_station_name, start_time FROM bigquery-public-data.san_francisco_bikeshare.bikeshare_trips",
timestamp_field="start_date",
credentials=Secret(scope="your-secrets-scope", key="your-bq-service-account-key"), # Optional
)
# Use in the BatchSource
ds = BatchSource(name="sf_bike_trips_ds", batch_config=bq_config)
tecton_df = ds.get_dataframe(start_time=datetime(2018, 1, 1), end_time=datetime(2018, 1, 1) + timedelta(days=1))
tecton_df.to_pandas().head(10)