在BigQuery UDF中运行SQL,可能是递归的

时间:2017-01-11 10:48:22

标签: google-bigquery

我想知道BigQuery中的递归UDF函数是否是我正在做的事情的正确解决方案。但首先,是否可以从UDF内部运行查询?

我在这里看到一个类似的问题:BigQuery : is it possible to execute another query inside an UDF?但解决方案似乎是一种执行直接SQL的解决方法。在我的情况下,我可能不得不反复/递归地调用UDF,而不事先知道步骤的数量(比如3-7步)。

这是一个简单的用例,用于在表中的用户名条目上构建关系图,具有X度分离,其中X将由最终用户作为参数提供。我的猜测是递归式UDF会运行良好,但有可能吗?

****编辑:有关用例的更多细节:**
考虑一个包含事务数据的表,其中包含每行中的对应项以及其他一些信息:

Buyer, Seller
Bob->Alice
Bob->Carol
Bob->John
John-Peter
John-Sam
Bob->Mary

假设我想要想象Bob与他的同伴之间的关系,1度的分离(即还显示每个对手的关系1步从Bob移除)。我想在这里使用像这样的力图:D3 Force-Collapsible Graph

此图表需要具有以下结构的.JSON文件:

{
    "name": "Bob", "size":5000,
    "children":
        [
            {"name":"Alice","size":3000},
            {"name":"Carol","size":3000},
            {"name":"John","size":3000,
            "children":[
                         {"name":"Peter","size":3000},
                         {"name":"Sam","size":3000}
                        ]},
            {"name":"Mary","size":3000}
        ]
}
因此,在1度分离的情况下,鲍勃有4个孩子,其中约翰有2个孩子。对于X度分离,这可以更深入,理想情况下使用用户提供的X,但实际上也可以硬编码为3级或5级。

2 个答案:

答案 0 :(得分:2)

您可以进行JavaScript UDF make递归调用,但不能执行另一个SQL语句。如果您事先知道递归/迭代的次数,则可以改为定义SQL函数,例如:

#standardSQL
CREATE TEMP FUNCTION SumToN(x INT64) AS (
  (SELECT SUM(v) FROM UNNEST(GENERATE_ARRAY(1, x)) AS v)
);

使用GENERATE_ARRAY,您可以创建所需长度的for循环。这是另一个不涉及UDF的示例,但使用GENERATE_ARRAY来连接可变数量的字符串:

#standardSQL
WITH T AS (
  SELECT 2 AS x, 'foo' AS y UNION ALL
  SELECT 4 AS x, 'bar' AS y)
SELECT
  y,
  (SELECT STRING_AGG(CONCAT(y, CAST(v AS STRING)))
   FROM UNNEST(GENERATE_ARRAY(1, x)) AS v) AS rep_y
FROM T;
+-----+---------------------+
| y   | rep_y               |
+-----+---------------------+
| foo | foo1,foo2           |
| bar | bar1,bar2,bar3,bar4 |
+-----+---------------------+

答案 1 :(得分:2)

请尝试以下
如果你需要将它扩展到更大的分离度,它足够通用并且具有非常简单的模式 为了举例,我介绍了尺寸属性的逻辑 - 这是(在下面的例子中)字面上的项目大小(根据其中的项目数量(包括其自身)) - 因此它基本上是儿童数+ 1 所以,享受:

#standardSQL
CREATE TEMP FUNCTION size(item STRING) AS (
  (SELECT CAST(IFNULL(1 + (LENGTH(item) - LENGTH(REPLACE(item, 'name', '')))/4, 1) AS STRING))
);
CREATE TEMP FUNCTION dress(parent STRING, children STRING) AS (
  (SELECT CONCAT('{"name":"', parent, '","size":', size(children), IFNULL(CONCAT(',"children":[', children, ']'), ''), '}'))
);
WITH items AS (
  SELECT 'Bob' AS parent, 'Alice' AS child UNION ALL
  SELECT 'Bob' AS parent, 'Carol' AS child UNION ALL
  SELECT 'Bob' AS parent, 'John' AS child UNION ALL
  SELECT 'John' AS parent, 'Peter' AS child UNION ALL
  SELECT 'John' AS parent, 'Sam' AS child UNION ALL
  SELECT 'Peter' AS parent, 'Sam' AS child UNION ALL
  SELECT 'Sam' AS parent, 'Mike' AS child UNION ALL
  SELECT 'Sam' AS parent, 'Nick' AS child UNION ALL
  SELECT 'Bob' AS parent, 'Mary' AS child 
), degree2 AS ( 
  SELECT d1.parent AS parent, d1.child AS child_1, d2.child AS child_2
  FROM items AS d1 LEFT JOIN items AS d2 ON d1.child = d2.parent
), degree3 AS (
  SELECT d1.*, d2.child AS child_3 
  FROM degree2 AS d1 LEFT JOIN items AS d2 ON d1.child_2 = d2.parent
), degree4 AS (
  SELECT d1.*, d2.child AS child_4 
  FROM degree3 AS d1 LEFT JOIN items AS d2 ON d1.child_3 = d2.parent
)
SELECT STRING_AGG(dress(parent, child_1), ',') AS parent FROM (
SELECT parent, STRING_AGG(dress(child_1, child_2), ',') AS child_1 FROM (
SELECT parent, child_1, STRING_AGG(dress(child_2, child_3), ',') AS child_2 FROM (
SELECT parent, child_1, child_2, STRING_AGG(dress(child_3, child_4), ',') AS child_3 FROM (
SELECT parent, child_1, child_2, child_3, STRING_AGG(dress(child_4, NULL), ',') AS child_4 FROM degree4
GROUP BY 1,2,3,4 ORDER BY 1,2,3,4 )
GROUP BY 1,2,3 ORDER BY 1,2,3 )
GROUP BY 1,2 ORDER BY 1,2 ) GROUP BY 1 ORDER BY 1 )  

它完全返回您所需的内容 - 请参阅下面的“美化”版本

{"name": "Bob","size": 12,"children": [
    {"name": "Alice","size": 1},
    {"name": "Carol","size": 1},
    {"name": "John","size": 8,"children": [
        {"name": "Peter","size": 4,"children": [
            {"name": "Sam","size": 3,"children": [
                {"name": "Mike","size": 1},
                {"name": "Nick","size": 1} ]}
          ]},
        {"name": "Sam","size": 3,"children": [
            {"name": "Mike","size": 1},
            {"name": "Nick","size": 1} ]}
      ]},
    {"name": "Mary","size": 1}
  ]},
{"name": "John","size": 8,"children": [
    {"name": "Peter","size": 4,"children": [
        {"name": "Sam","size": 3,"children": [
            {"name": "Mike","size": 1},
            {"name": "Nick","size": 1} ]}
      ]},
    {"name": "Sam","size": 3,"children": [
        {"name": "Mike","size": 1},
        {"name": "Nick","size": 1} ]}
  ]},
{"name": "Peter","size": 4,"children": [
    {"name": "Sam","size": 3,"children": [
        {"name": "Mike","size": 1},
        {"name": "Nick","size": 1} ]}
  ]},
{"name": "Sam","size": 3,"children": [
    {"name": "Mike","size": 1},
    {"name": "Nick","size": 1} ]}

最有可能,上面可以进一步推广 - 但我认为你已经足够好了尝试:o)