如何在bigquery中连接两个Struct数组?

时间:2017-04-22 06:26:38

标签: google-bigquery

我试图在我的查询中连接两个struct数组并继续收到签名错误。两个结构是相同的(结构中的字段在类型和数字上匹配)。

select order_id, case when h.filled is not null and rf.new is not null then array_concat( h.filled, rf.new)  else null end filled_and_new  from....

它给出了错误:

Error: No matching signature for function ARRAY_CONCAT for argument types: ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, ...>>, ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, ...>>. Supported signature: ARRAY_CONCAT(ARRAY, [ARRAY, ...]) at [10:18]

这是否意味着array_concat无法组合两个Structs数组(具有相同的确切布局)?

由于

以下是两个数组的定义:

reservations_filled RECORD  REPEATED    
reservations_filled.reservation_id  STRING  NULLABLE    
reservations_filled.s1_order_id STRING  NULLABLE    
reservations_filled.s2_order_id STRING  NULLABLE    
reservations_filled.amount  INTEGER NULLABLE    
reservations_filled.created_time    TIMESTAMP   NULLABLE    
reservations_filled.updated_time    TIMESTAMP   NULLABLE    
reservations_filled.state   STRING  NULLABLE    
reservations_filled.rate    FLOAT   NULLABLE    
reservations_filled.u_amount    INTEGER NULLABLE    
reservations_filled.u_fees  INTEGER NULLABLE    

和连接表中的数组:

rsrvtn_array    RECORD  REPEATED    
rsrvtn_array.reservation_id STRING  NULLABLE    
rsrvtn_array.s1_order_id    STRING  NULLABLE    
rsrvtn_array.s2_order_id    STRING  NULLABLE    
rsrvtn_array.amount INTEGER NULLABLE    
rsrvtn_array.created    TIMESTAMP   NULLABLE    
rsrvtn_array.updated    TIMESTAMP   NULLABLE    
rsrvtn_array.state  STRING  NULLABLE    
rsrvtn_array.rate   FLOAT   NULLABLE    
rsrvtn_array.u_amount   INTEGER NULLABLE    
rsrvtn_array.u_fees INTEGER NULLABLE

,查询是:

 select t1.rsrvtn_array a, t2.reservations_filled b , array_concat(t1.rsrvtn_array, t2.reservations_filled) c from temp.new_orders t1 join temp.order_history t2 on using(order_id)

1 个答案:

答案 0 :(得分:2)

  

这是否意味着array_concat无法组合两个Structs数组(具有相同的确切布局)?

  

ARRAY_CONCAT将两个STRUCT阵列组合在一起,具有相同的模式! 见下面的例子/证明

#standardSQL
with data AS (
SELECT  
  ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING>>[('r1', 's1', 'b1')] AS x1,
  ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING>>[('r2', 's2', 'b2'), ('r3', 's3', 'b3')] AS x2
UNION ALL
SELECT  
  ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING>>[('r5', 's5', 'b5')] AS x1,
  NULL AS x2
)  
SELECT ARRAY_CONCAT(x1, x2) AS y
FROM data

因此,很可能两个数组中的模式实际上是不同的 - 在这种情况下,错误消息将如您所见 - 请参阅下面的示例

#standardSQL
WITH data1 AS (
  SELECT 1 AS id, ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, c_id STRING>>
    [('r1', 's1', 'b1', 'c1')] AS x1
  UNION ALL
  SELECT 2 AS id, ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, c_id STRING>>
    [('r5', 's5', 'b5', 'c5')] AS x1
),  
data2 AS (
  SELECT 1 AS id, ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, cc_id STRING>>
    [('r2', 's2', 'b2', 'c2'), ('r3', 's3', 'b3', 'c3')] AS x2
  UNION ALL
  SELECT 2 AS id, NULL AS x2
)  
SELECT data1.id,  ARRAY_CONCAT(data1.x1, data2.x2) AS y
FROM data1 
JOIN data2 
ON data1.id = data2.id

此处的错误与您在示例中看到的完全相同

Error: NO matching signature FOR FUNCTION ARRAY_CONCAT FOR argument types: 
ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, ...>>,
ARRAY<STRUCT<r_id STRING, s_id STRING, b_id STRING, ...>>. 
Supported signature: ARRAY_CONCAT(ARRAY, [ARRAY, ...]) AT  [15:23]

错误消息被删除所以那些可见字段肯定是相同的,但实际上 - 最后一个字段--c和cc - (被截断)在两个数组中都是不同的

希望这有帮助!

  

更新

查看两个achemas的以下片段:

reservations_filled.created_time    TIMESTAMP   NULLABLE    
reservations_filled.updated_time    TIMESTAMP   NULLABLE     

rsrvtn_array.created    TIMESTAMP   NULLABLE    
rsrvtn_array.updated    TIMESTAMP   NULLABLE  

显然,我在上面的例子中预测的情况

  

解决方案

所以,下面会按预期失败

#standardSQL
WITH t1 AS (
SELECT 1 AS id,  
  ARRAY<STRUCT<a STRING, b STRING, cc STRING>>[('a1', 'b1', 'c1')] AS x
),
t2 AS (
SELECT 1 AS id,  
  ARRAY<STRUCT<a STRING, b STRING, c STRING>>[('a2', 'b2', 'c2')] AS y
)
SELECT x, y, ARRAY_CONCAT(x, y) AS z
FROM t1 JOIN t2 USING(id) 

因为(a,b,c)和(a,b,cc)有一个名称不同的元素

并且,以下将起作用

#standardSQL
WITH t1 AS (
SELECT 1 AS id,  
  ARRAY<STRUCT<a STRING, b STRING, cc STRING>>[('a1', 'b1', 'c1')] AS x
),
t2 AS (
SELECT 1 AS id,  
  ARRAY<STRUCT<a STRING, b STRING, c STRING>>[('a2', 'b2', 'c2')] AS y
)
SELECT x, y, 
  ARRAY_CONCAT(ARRAY(SELECT AS STRUCT a, b, cc AS c FROM UNNEST(x)), y) AS z
FROM t1 JOIN t2 USING(id) 

因为cc被“动态”别名化为c因此使得schamas不仅仅是布局相似而是相同

希望现在有所帮助

如果您在将上述解决方案应用于您的示例时遇到问题 - 请参阅以下内容:o)

SELECT
  t1.rsrvtn_array a,
  t2.reservations_filled b,
  ARRAY_CONCAT(
    ARRAY(
      SELECT AS STRUCT 
        reservation_id, 
        s1_order_id, 
        s2_order_id, 
        amount, created AS created_time, 
        updated AS updated_time, 
        state, 
        rate, 
        u_amount, 
        u_fees 
      FROM UNNEST(t1.rsrvtn_array)
    ) , t2.reservations_filled) AS c
FROM temp.new_orders t1
JOIN temp.order_history t2
ON USING(order_id)