Question

我尝试在每次UDF处理一行之后更新所有行的列。

该示例有3行，每列6列。专栏＆＃34; A＆＃34;在3行中具有相同的值;栏＆＃34; B＆＃34;和＆＃34; A＆＃34;是每行的联合标识符;专栏＆＃34; C＆＃34;是a，b，c，d，e中任何字母的数组;专栏＆＃34; D＆＃34;是要填充的目标数组;专栏＆＃34; E＆＃34;是一些整数;专栏＆＃34; abcde＆＃34;是具有5个整数的整数数组，指定每个字母a，b，c，d，e的计数。

每行将传递到UDF以更新列＆＃34; D＆＃34;和列＆＃34; abcde＆＃34;根据专栏＆＃34; C＆＃34;和列＆＃34; E＆＃34;。规则是：选择＆＃34; E＆＃34;指定的数字，来自＆＃34; C＆＃34;放入＆＃34; D＆＃34 ;;选择是随机的;在每行选择完成一行后，该列“abcde＆＃39;将在所有行中更新。

例如，要处理第一行，我们会随机选择一个项目（＆＃39; a＆＃39;＆＃39; b＆＃39;＆＃39; c＆＃39;）以放入＆＃34; d＆＃34 ;.让我们说系统选择了＆＃39; c＆＃39;在列＆＃34; C＆＃34;中，所以＆＃34; D＆＃34;中的值这一行成为[＆＃39; c＆＃39;]和＆＃39; abcde＆＃39;所有三行都更新为[1,3,1,1,1]（之前是[1,3,2,1,1]）。

示例数据：

#StandardSQL in BigQuery #code to generate the example table with sample as ( select 'y1' as A, 'x1' as B, ['a','b','c'] as C, [] as D, 1 as E, [1,3,2,1,1] as abcde union all select 'y1','x2',['a','b'],[],2,[1,3,2,1,1] union all select 'y1','x3',['c','d','e'],[],3,[1,3,2,1,1]) select * from sample order by B

处理完第一行后：

with sample as ( select 'y1' as A, 'x1' as B, ['a','b','c'] as C, ['c'] as D, 1 as E, [1,3,1,1,1] as abcde union all select 'y1','x2',['a','b'],[],2,[1,3,1,1,1] union all select 'y1','x3',['c','d','e'],[],3,[1,3,1,1,1]) select * from sample order by B

处理完第二行后：

with sample as ( select 'y1' as A, 'x1' as B, ['a','b','c'] as C, ['c'] as D, 1 as E, [0,2,1,1,1] as abcde union all select 'y1','x2',['a','b'],['a','b'],2,[0,2,1,1,1] union all select 'y1','x3',['c','d','e'],[],3,[0,2,1,1,1]) select * from sample order by B

处理完第三行后：

with sample as ( select 'y1' as A, 'x1' as B, ['a','b','c'] as C, ['c'] as D, 1 as E, [0,2,0,0,0] as abcde union all select 'y1','x2',['a','b'],['a','b'],2,[0,2,0,0,0] union all select 'y1','x3',['c','d','e'],['c','d','e'],3,[0,2,0,0,0]) select * from sample order by B

不要担心UDF将如何进行随机选择。我只是想知道，如果在BigQuery中可以执行更新列的任务，那么可以选择abcde＆＃39;在我想要的方式？

我尝试过使用UDF，但是我努力让它工作，因为我对UDF的理解是它只能占用一行并产生多行。所以，我无法更新其他行。是否可以使用SQL？

预期输出：

处理完第一行后：

处理完第三行后：

其他信息：

create temporary function selection(A string, B string, C ARRAY<STRING>, D ARRAY<STRING>, E INT64, abcde ARRAY<INT64>) returns STRUCT< A stRING, B string, C array<string>, D array<string>, E int64, abcde array<int64>> language js AS """ /* for the row i in the data: select the number i.E of items (randomly) from i.C where the numbers associated with the item in i.abcde is bigger than 0 (i.e. only the items with numbers in abcde bigger than 0 can be the cadidates for the random selection); put the selected items in i.D and deduct the amount of selected items from the number for the corresponding item in the column 'abcde' FOR ALL ROWS; proceed to the next row i+1 until every row is processed; */ return {A,B,C,D,E,abcde} """; with sample as ( select 'y1' as A, 'x1' as B, ['a','b','c'] as C, CAST([] AS ARRAY<STRING>) as D, 1 as E, [1,3,2,1,1] as abcde union all select 'y1','x2',['a','b'],[],2,[1,3,2,1,1] union all select 'y1','x3',['c','d','e'],[],2,[1,3,2,1,1]) select selection(A,B,C,D,E,abcde) from sample order by B

Answer 1

以下是BigQuery Standard SQL

#StandardSQL
WITH sample AS (
  SELECT 'y1' AS A, 'x1' AS B, ['a','b','c'] AS C, ['c'] AS D, 1 AS E, [1,3,2,1,1] AS abcde UNION ALL
  SELECT 'y1','x2',['a','b'],['a','b'],2,[1,3,2,1,1] UNION ALL
  SELECT 'y1','x3',['c','d','e'],['c','d','e'],3,[1,3,2,1,1] UNION ALL

  SELECT 'y2' AS A, 'x1' AS B, ['a','b','c'] AS C, ['a','b'] AS D, 2 AS E, [1,3,2,1,1] AS abcde UNION ALL
  SELECT 'y2','x2',['a','b'],['b'],1,[1,3,2,1,1] UNION ALL
  SELECT 'y2','x3',['c','d','e'],['d','e'],2,[1,3,2,1,1]  
),
counts AS (
  SELECT A AS AA, dd, COUNT(1) AS cnt
  FROM sample, UNNEST(D) AS dd
  GROUP BY AA, dd
),
processed AS (
  SELECT A, B, ARRAY_AGG(aa - IFNULL(cnt, 0) ORDER BY pos) AS abcde
  FROM sample, UNNEST(abcde) AS aa WITH OFFSET AS pos
  LEFT JOIN counts ON A = counts.AA 
  AND CASE dd 
        WHEN 'a' THEN 0 
        WHEN 'b' THEN 1 
        WHEN 'c' THEN 2 
        WHEN 'd' THEN 3 
        WHEN 'e' THEN 4 
      END = pos
  GROUP BY A, B
)
SELECT s.A, s.B, s.C, s.D, s.E, p.abcde
FROM sample AS s
JOIN processed AS p
USING (A, B)
-- ORDER BY A, B

不要担心UDF如何进行随机选择

所以，正如你所看到的 - 我只是将“随机”值放入样本数据中以模仿D

如何在每次BigQuery中的UDF处理一行之后更新所有行的列？

1 个答案: