Question

对于BigQuery表，我正在尝试运行一个调用UDF的SQL语句。该语句在Python脚本中执行，并且通过BigQuery API进行调用。

当我执行一个没有UDF的简单SQL语句时，它可以正常工作。但是，当我尝试使用UDF脚本（存储在本地或存储在GCS存储桶中）时，我一直收到同样的错误。这是我在本地终端上获得的（我通过Python启动器运行脚本）：

追踪（最近一次通话）：文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/googleapiclient/http.py”，第840行，执行中引发HttpError（resp，content，uri = self.uri）googleapiclient.errors.HttpError：https：//www.googleapis.com/bigquery/v2/projects/[projectId]/queries ?alt = json 返回“缺少必需参数”＆gt;

这是我的Python脚本：

credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
key,
scope='https://www.googleapis.com/auth/bigquery')

aservice = build('bigquery','v2',credentials=credentials)
query_requestb = aservice.jobs()

query_data = {
    'configuration': {
        'query': {
            'userDefinedFunctionResources': [
                {
                   'resourceUri': 'gs://[bucketName]/[fileName].js'
                }
            ],
            'query': sql
        }
    },
    'timeoutMs': 100000
}

query_response = query_requestb.query(projectId=PROJECT_NUMBER,body=query_data).execute(num_retries=0)

知道'缺少参数'或者我如何让它运行？

Answer 1

不是指定userDefinedFunctionResources，而是在CREATE TEMP FUNCTION的正文中使用'query'，并将库作为OPTIONS子句的一部分引用。为此，您需要使用standard SQL，还可以参考user-defined functions上的文档。您的查询看起来像这样：

#standardSQL
CREATE TEMP FUNCTION MyJsFunction(x FLOAT64) RETURNS FLOAT64 LANGUAGE js AS """
  return my_js_function(x);
"""
OPTIONS (library='gs://[bucketName]/[fileName].js');

SELECT MyJsFunction(x)
FROM MyTable;

Answer 2

我想要运行的查询是按营销渠道对流量和销售进行分类，我通常使用UDF。这是我使用standard SQL运行的查询。此查询存储在我读取并存储在变量sql中的文件中：

CREATE TEMPORARY FUNCTION
  mktchannels(source STRING,
    medium STRING,
    campaign STRING)
  RETURNS STRING
  LANGUAGE js AS """
return channelGrouping(source,medium,campaign) // where channelGrouping is the function in my channelgrouping.js file which contains the attribution rules
  """ OPTIONS ( library=["gs://[bucket]/[path]/regex.js",
    "gs://[bucket]/[path]/channelgrouping.js"] );
WITH
  traffic AS ( // select fields from the BigQuery table
  SELECT
    device.deviceCategory AS device,
    trafficSource.source AS source,
    trafficSource.medium AS medium,
    trafficSource.campaign AS campaign,
    SUM(totals.visits) AS sessions,
    SUM(totals.transactionRevenue)/1e6 as revenue,
    SUM(totals.transactions) as transactions
  FROM
    `[datasetId].[table]`
  GROUP BY
    device,
    source,
    medium,
    campaign)
SELECT
  mktchannels(source,
    medium,
    campaign) AS channel, // call the temp function set above
  device,
  SUM(sessions) AS sessions,
  SUM(transactions) as transactions,
  ROUND(SUM(revenue),2) as revenue
FROM
  traffic
GROUP BY
  device,
  channel
ORDER BY
  channel,
  device;

然后在Python脚本中：

fd = file('myquery.sql', 'r')
sql = fd.read()
fd.close()

query_data = {
    'query': sql,
    'maximumBillingTier': 10,
    'useLegacySql': False,
    'timeoutMs': 300000
}

希望这有助于将来的任何人！

如何使用调用UDF的Python脚本来使用BigQuery API

2 个答案: