Question

这是我在BigQuery中运行的查询，我希望在我的python脚本中运行。我将如何更改此/我必须添加什么才能在Python中运行它。

#standardSQL
SELECT
  Serial,
  MAX(createdAt) AS Latest_Use,
  SUM(ConnectionTime/3600) as Total_Hours,
  COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;

从我一直在研究它，我说我不能使用Python将此查询保存为永久表。真的吗？如果是真的，是否仍然可以导出临时表？

Answer 1

你需要使用here，然后这样的事情可以启动并运行：

from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()

BigQuery Python client lib

查看当前的https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html。

Answer 2

这是一个很好的用法指南： https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html

只需运行并编写查询：

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'

job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table("your_table_id")
job_config.destination = table_ref
sql = """
    SELECT corpus
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY corpus;
"""

# Start the query, passing in the extra configuration.
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location="US",
    job_config=job_config,
)  # API request - starts the query

query_job.result()  # Waits for the query to finish
print("Query results loaded to table {}".format(table_ref.path))

Answer 3

我个人更喜欢使用熊猫查询：

# BQ authentication
import pydata_google_auth
SCOPES = [
    'https://www.googleapis.com/auth/cloud-platform',
    'https://www.googleapis.com/auth/drive',
]

credentials = pydata_google_auth.get_user_credentials(
    SCOPES,
    # Set auth_local_webserver to True to have a slightly more convienient
    # authorization flow. Note, this doesn't work if you're running from a
    # notebook on a remote sever, such as over SSH or with Google Colab.
    auth_local_webserver=True,
)

query = "SELECT * FROM my_table"

data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')

Answer 4

pythonbq软件包使用非常简单，也是一个很好的起点。它使用python-gbq。

要开始使用，您需要为外部应用访问生成一个BQ json密钥。您可以生成密钥here。

您的代码应类似于：

from pythonbq import pythonbq

myProject=pythonbq(
  bq_key_path='path/to/bq/key.json',
  project_id='myGoogleProjectID'
)
SQL_CODE="""
SELECT
  Serial,
  MAX(createdAt) AS Latest_Use,
  SUM(ConnectionTime/3600) as Total_Hours,
  COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
"""
output=myProject.query(sql=SQL_CODE)

Answer 5

这是对服务帐户使用JSON文件的另一种方法：

>>> from google.cloud import bigquery
>>>
>>> CREDS = 'test_service_account.json'
>>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
>>> job = client.query('select * from dataset1.mytable')
>>> for row in job.result():
...     print(r)

如何在Python中运行BigQuery查询

5 个答案: