BigQueryConfig
Summary​
Configuration used to reference a BigQuery table or query.
The BigQueryConfig class is used to create a reference to a BigQuery table. You can also create a reference to a query on one or more tables, which will be registered in Tecton in a similar way as a view is registered in other data systems.
This class is used as an input to a
BatchSource
's parameter batch_config
.
Declaring this configuration class alone will not register a Data Source.
Instead, declare as a part of BatchSource
that takes this configuration class
instance as a parameter.
Attributes​
Name | Data Type | Description |
---|---|---|
data_delay | How long they wait after the end of the batch schedule period before starting, typically to ensure that all data has landed. |
Methods​
Name | Description |
---|---|
__init__(...) | Instantiates a new BigQueryConfig. One of table and query should be specified when creating this file. |
__init__(...)​
Instantiates a new BigQueryConfig. One of table and query should be specified when creating this file.Parameters
project_id
(Optional
[str
]) - The BigQuery Project ID for this Data source. Default:None
dataset
(Optional
[str
]) - The BigQuery Dataset for this Data source. Default:None
location
(Optional
[str
]) - Optional geographic location of the dataset, such as "US" or "EU". This is for ensuring that queries are run in the same location as the data. Default:None
table
(Optional
[str
]) - The table for this Data source. Only one oftable
andquery
must be specified. Default:None
query
(Optional
[str
]) - The query for this Data source. Only one oftable
andquery
must be specified. Default:None
timestamp_field
(Optional
[str
]) - The timestamp column in this data source that should be used byFilteredSource
to filter data from this source, before any feature view transformations are applied. Only required if this source is used withFilteredSource
. Default:None
data_delay
(timedelta
) - This parameter configures how long jobs wait after the end of the batch_schedule period before starting, typically to ensure that all data has landed. For example, if a feature view has abatch_schedule
of 1 day and one of the data source inputs hasdata_delay=timedelta(hours=1)
set, then incremental materialization jobs will run at01:00
UTC. Default:0:00:00
credentials
(Optional
[Secret
]) - Optional service account credentials used to connect to BigQuery. Default:None
Returns
A BigQueryConfig class instance.Example​
- Table
- Query
from datetime import datetime, timedelta
from tecton import BigQueryConfig, BatchSource, Secret
# Declare BigQueryConfig instance object that can be used as an argument in BatchSource
bq_config = BigQueryConfig(
project_id="bigquery-public-data",
dataset="san_francisco_bikeshare",
location="US",
table="bikeshare_trips",
timestamp_field="start_date",
credentials=Secret(scope="your-secrets-scope", key="your-bq-service-account-key"), # Optional
)
# Use in the BatchSource
ds = BatchSource(name="sf_bike_trips_ds", batch_config=bq_config)
tecton_df = ds.get_dataframe(start_time=datetime(2018, 1, 1) - timedelta(days=1), end_time=datetime(2018, 1, 1))
tecton_df.to_pandas().head(10)
from datetime import datetime, timedelta
from tecton import BigQueryConfig, BatchSource, Secret
# Declare BigQueryConfig instance object that can be used as an argument in BatchSource
bq_config = BigQueryConfig(
project_id="bigquery-public-data",
dataset="san_francisco_bikeshare",
location="US",
query="SELECT trip_id, duration_sec, start_station_name, start_time FROM bigquery-public-data.san_francisco_bikeshare.bikeshare_trips",
timestamp_field="start_date",
credentials=Secret(scope="your-secrets-scope", key="your-bq-service-account-key"), # Optional
)
# Use in the BatchSource
ds = BatchSource(name="sf_bike_trips_ds", batch_config=bq_config)
tecton_df = ds.get_dataframe(start_time=datetime(2018, 1, 1), end_time=datetime(2018, 1, 1) + timedelta(days=1))
tecton_df.to_pandas().head(10)