Version: 1.1

BigQueryConfig

Summary

Configuration used to reference a BigQuery table or query.

The BigQueryConfig class is used to create a reference to a BigQuery table. You can also create a reference to a query on one or more tables, which will be registered in Tecton in a similar way as a view is registered in other data systems.

This class is used as an input to a BatchSource's parameter batch_config. Declaring this configuration class alone will not register a Data Source. Instead, declare as a part of BatchSource that takes this configuration class instance as a parameter.

Attributes

Name	Data Type	Description
`data_delay`		How long they wait after the end of the batch schedule period before starting, typically to ensure that all data has landed.

Methods

Name	Description
`__init__(...)`	Instantiates a new BigQueryConfig. One of table and query should be specified when creating this file.

init(...)

Instantiates a new BigQueryConfig. One of table and query should be specified when creating this file.

Parameters

project_id: Optional[str] = None The BigQuery Project ID for this Data source.
dataset: Optional[str] = None The BigQuery Dataset for this Data source.
location: Optional[str] = None Optional geographic location of the dataset, such as "US" or "EU". This is for ensuring that queries are run in the same location as the data.
table: Optional[str] = None The table for this Data source. Only one of table and query must be specified.
query: Optional[str] = None The query for this Data source. Only one of table and query must be specified.
timestamp_field: Optional[str] = None The timestamp column in this data source that should be used for time-based filtering. Required unless this source is used in Feature Views only with unfiltered().
data_delay: timedelta = 0:00:00 This parameter configures how long jobs wait after the end of the batch_schedule period before starting, typically to ensure that all data has landed. For example, if a feature view has a batch_schedule of 1 day and one of the data source inputs has data_delay=timedelta(hours=1) set, then incremental materialization jobs will run at 01:00 UTC.
credentials: Optional[Secret] = None Optional service account credentials used to connect to BigQuery.

Returns

A BigQueryConfig class instance.

Example

Table
Query

from datetime import datetime, timedelta
from tecton import BigQueryConfig, BatchSource, Secret


# Declare BigQueryConfig instance object that can be used as an argument in BatchSource
bq_config = BigQueryConfig(
    project_id="bigquery-public-data",
    dataset="san_francisco_bikeshare",
    location="US",
    table="bikeshare_trips",
    timestamp_field="start_date",
    credentials=Secret(scope="your-secrets-scope", key="your-bq-service-account-key"),  # Optional
)

# Use in the BatchSource
ds = BatchSource(name="sf_bike_trips_ds", batch_config=bq_config)
tecton_df = ds.get_dataframe(start_time=datetime(2018, 1, 1) - timedelta(days=1), end_time=datetime(2018, 1, 1))
tecton_df.to_pandas().head(10)

from datetime import datetime, timedelta
from tecton import BigQueryConfig, BatchSource, Secret


# Declare BigQueryConfig instance object that can be used as an argument in BatchSource
bq_config = BigQueryConfig(
    project_id="bigquery-public-data",
    dataset="san_francisco_bikeshare",
    location="US",
    query="SELECT trip_id, duration_sec, start_station_name, start_time FROM bigquery-public-data.san_francisco_bikeshare.bikeshare_trips",
    timestamp_field="start_date",
    credentials=Secret(scope="your-secrets-scope", key="your-bq-service-account-key"),  # Optional
)

# Use in the BatchSource
ds = BatchSource(name="sf_bike_trips_ds", batch_config=bq_config)
tecton_df = ds.get_dataframe(start_time=datetime(2018, 1, 1), end_time=datetime(2018, 1, 1) + timedelta(days=1))
tecton_df.to_pandas().head(10)

Summary​

Attributes​

Methods​

__init__(...)​