使用范围简化数组查询

时间:2018-02-26 10:43:06

标签: sql arrays database google-bigquery

我有一个大的查询数据表,其中512个变量作为具有相当长名称的数组(x__x_arrVal_arrSlices_0__arrValues到arrSlices_511)。在每个数组中都有360个值。双工具无法以此形式计算数组。这就是我想将每个值作为输出的原因。

我现在使用的查询摘录是:

SELECT
 timestamp, x_stArrayTag_sArrayName, x_stArrayTag_sComission,
 1 as row,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(1)] AS f001,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(10)] AS f010,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(20)] AS f020,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(30)] AS f030,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(40)] AS f040,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(50)] AS f050,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(60)] AS f060,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(70)] AS f070,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(80)] AS f080,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(90)] AS f090,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(100)] AS f100,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(110)] AS f110,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(120)] AS f120,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(130)] AS f130,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(140)] AS f140,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(150)] AS f150,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(160)] AS f160,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(170)] AS f170,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(180)] AS f180,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(190)] AS f190,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(200)] AS f200,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(210)] AS f210,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(220)] AS f220,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(230)] AS f230,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(240)] AS f240,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(250)] AS f250,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(260)] AS f260,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(270)] AS f270,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(280)] AS f280,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(290)] as f290,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(300)] AS f300,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(310)] AS f310,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(320)] AS f320,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(330)] AS f330,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(340)] AS f340,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(350)] AS f350,
 x__x_arrVal_arrSlices_1__arrValues[OFFSET(359)] AS f359

 FROM
 `project.table` 

 WHERE
 _PARTITIONTIME >= "2017-01-01 00:00:00"
 AND _PARTITIONTIME < "2018-02-16 00:00:00"

UNION ALL

遗憾的是,我得到的输出只是所有值的破裂。使用此查询获取所有512 * 360值是不可能的,因为如果我对所有切片使用此查询,则达到bigquery的限制。

是否可以重命名长名称并选择范围?

最好的问候 SCOTTI

2 个答案:

答案 0 :(得分:1)

使用UNNEST可以获得360行和512列。这是一个小例子:

WITH data AS (
  SELECT
    [1, 2, 3, 4] as a,
    [2, 3, 4, 5] as b,
    [3, 4, 5, 6] as c
)
SELECT v1, b[OFFSET(off)] as v2, c[OFFSET(off)] as v3
FROM data, unnest(a) as v1 WITH OFFSET off

输出:

v1  v2  v3   
1   2   3    
2   3   4    
3   4   5    
4   5   6   

答案 1 :(得分:1)

考虑到你正在处理的一个混乱的桌子 - 在重组决定重要方面是实现该决定的查询的实用性

在您的具体情况下 - 我建议完全展平数据,如下所示(每行将转换为~180000行,每行代表原始行中一个数组的一个元素 - 切片字段将表示数组编号和pos将表示该数组中的元素位置) - 查询通用性足以处理切片和数组大小的任何数量/名称,同时结果灵活且通用,足以用于任何可想象的算法

#standardSQL
WITH `project.dataset.messytable` AS (
  SELECT 1 id, 
    [ 1,  2,  3,  4,  5] x__x_arrVal_arrSlices_0, 
    [11, 12, 13, 14, 15] x__x_arrVal_arrSlices_1,
    [21, 22, 23, 24, 25] x__x_arrVal_arrSlices_2 UNION ALL
  SELECT 2 id, 
    [ 6,  7,  8,  9, 10] x__x_arrVal_arrSlices_0, 
    [16, 17, 18, 19, 20] x__x_arrVal_arrSlices_1,
    [26, 27, 28, 29, 30] x__x_arrVal_arrSlices_2 
)
SELECT 
  id, 
  slice,
  pos,
  value
FROM `project.dataset.messytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"x__x_arrVal_arrSlices_(\d+)":\[.*?\]')) slice WITH OFFSET x
JOIN UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"x__x_arrVal_arrSlices_\d+":\[(.*?)\]')) arr WITH OFFSET y
ON x = y,
UNNEST(SPLIT(arr)) value WITH OFFSET pos  

你可以使用下面的虚拟例子来测试/玩它

Row id  slice   pos value    
1   1   0       0     1  
2   1   0       1     2  
3   1   0       2     3  
4   1   0       3     4  
5   1   0       4     5  
6   1   1       0     11     
7   1   1       1     12     
8   1   1       2     13     
9   1   1       3     14     
10  1   1       4     15     
11  1   2       0     21     
12  1   2       1     22     
13  1   2       2     23     
14  1   2       3     24     
15  1   2       4     25     
16  2   0       0     6  
17  2   0       1     7  
18  2   0       2     8  
19  2   0       3     9  
20  2   0       4     10     
21  2   1       0     16     
22  2   1       1     17     
23  2   1       2     18     
24  2   1       3     19     
25  2   1       4     20     
26  2   2       0     26     
27  2   2       1     27     
28  2   2       2     28     
29  2   2       3     29     
30  2   2       4     30      

结果如下

$ids = [1, 2, 3, 4, 5, 6];