JS UDF返回标准SQL / BigQuery的结构并创建两列

时间:2019-04-03 17:39:15

标签: javascript sql database google-bigquery standard-sql

我正在尝试使用Javascript为BigQuery编写用户定义的函数,该函数返回一个结构并生成两列:

CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64> LANGUAGE js AS
  LANGUAGE js AS 
"""
    var exampleStruct = {1:100, 2:200}
    return exampleStruct;
""";

我的查询将是这样的:

SELECT
exampleCol,
exampleFunction(stringCol) -- use SELECT AS STRUCT somewhere here? with the aliases “First” and “Second”
FROM
[SOME DATATBASE HERE]

我希望exampleFunction(stringCol)的输出生成两列(如果包含exampleCol,则总共为三列)。例如,如果exampleCol为我们提供了“ SOMETHING”,我想返回以下列:“ SOMETHING”(例如“ Col”),1(“ First”)和“ Second”(第二)。这有可能吗?

我无法从JS函数返回STRUCT(不确定我的语法是否关闭),并且无法正确查询。对于查询,我想避免两次运行JavaScript函数。谢谢!

2 个答案:

答案 0 :(得分:1)

以下示例适用于BigQuery标准SQL

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64> 
  LANGUAGE js AS 
"""
    arr = exampleString.split(':');
    this.index = arr[0];
    this.latency = arr[1];
    return this;
""";
WITH `project.dataset.table` AS (
  SELECT 1 exampleCol, '10:100' stringCol UNION ALL
  SELECT 2, '20:200' UNION ALL
  SELECT 3, '30:456'
)
SELECT exampleCol, exampleFunction(stringCol).*
FROM `project.dataset.table`
-- ORDER BY exampleCol   

有结果

Row exampleCol  index   latency  
1   1           10      100  
2   2           20      200  
3   3           30      456   

注意:如果您想为列分别命名为First,Second,则可以分别用indexlatency替换firstsecond,如下面的示例< / p>

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<first INT64, second INT64> 
  LANGUAGE js AS 
"""
    arr = exampleString.split(':');
    this.first = arr[0];
    this.second = arr[1];
    return this;
""";
SELECT exampleCol, exampleFunction(stringCol).*
FROM `project.dataset.table`  

或者您可以使用以下方法

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64> 
  LANGUAGE js AS 
"""
    arr = exampleString.split(':');
    this.index = arr[0];
    this.latency = arr[1];
    return this;
""";
SELECT exampleCol, index AS first, latency AS second   
FROM (
  SELECT exampleCol, exampleFunction(stringCol).*
  FROM `project.dataset.table`
)

在两种情况下均具有以下结果

Row exampleCol  first   second   
1   1           10      100  
2   2           20      200  
3   3           30      456  

答案 1 :(得分:0)

我想补充米哈伊尔·伯利安(Mikhail Berlyant)的答案,在这种情况下效果很好,但是在稍微不同的情况下我遇到了问题。

我建议不要使用JavaScript中的“ this”来保留行中的数据,而建议使用新的对象来实现。

在我的示例中,我想再返回一个列,该列值基于另一个现有列的值。我将再添加一列名为“ latencyUnder150”的列,如果延迟字段的值小于150,则其值为“ Y”,否则请将该字段留空。

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64, latencyUnder150 STRING> 
  LANGUAGE js AS 
"""
    arr = exampleString.split(':');
    this.index = arr[0];
    this.latency = arr[1];
    if (this.latency < 150) {
        this.latencyUnder150 = 'Y'
    }
    return this;
""";
WITH `project.dataset.table` AS (
  SELECT 1 exampleCol, '10:100' stringCol UNION ALL
  SELECT 2, '20:200' UNION ALL
  SELECT 3, '30:456'
)
SELECT exampleCol, exampleFunction(stringCol).*
FROM `project.dataset.table`
-- ORDER BY exampleCol   

在JS中使用“ this”变量,您将获得此答案。

| Row | exampleCol | index | latency | latencyUnder150 |
|-----|------------|-------|---------|-----------------|
| 1   | 1          | 10    | 100     | Y               |
| 2   | 2          | 20    | 200     | Y               |
| 3   | 3          | 30    | 456     | Y               |

从第一条记录中您可以看到,latencyUnder150字段保留值为“ Y”。

通过稍微更改代码以使用新对象,每一行从上一行开始就没有值。

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64, latencyUnder150 STRING> 
  LANGUAGE js AS 
"""
    var outObj = {}
    arr = exampleString.split(':');
    outObj.index = arr[0];
    outObj.latency = arr[1];
    if (outObj.latency < 150) {
        outObj.latencyUnder150 = 'Y'
    }
    return outObj;
""";
WITH `project.dataset.table` AS (
  SELECT 1 exampleCol, '10:100' stringCol UNION ALL
  SELECT 2, '20:200' UNION ALL
  SELECT 3, '30:456'
)
SELECT exampleCol, exampleFunction(stringCol).*
FROM `project.dataset.table`
-- ORDER BY exampleCol   
| Row | exampleCol | index | latency | latencyUnder150 |
|-----|------------|-------|---------|-----------------|
| 1   | 1          | 10    | 100     | Y               |
| 2   | 2          | 20    | 200     | (null)          |
| 3   | 3          | 30    | 456     | (null)          |