如何在bigquery中将columnname添加到value

时间:2017-11-14 19:02:26

标签: google-bigquery

我有一个包含3个“代码”字段的bigquery表。其中一些字段用于查找代码表。假设表格如下:

data table:
    id    code1    code2     code3     data1
    1       Y        3         A        IA
    2       Y        2         B        IB
    3       N        5         C        IC

为了执行查找,我必须将field_name连接到由冒号分隔的值。我不能硬编码列名。使用大查询,有没有办法使用表对象来推断select语句中的列名?

例如:

select * from code_table join data_table where code1 = code.code_values 
the value of code1 coming out is 'code1:Y'  not 'Y'. 

我想知道是否有一种方法可以在code1值中动态地注入column_name,因为它会发送到code_table。

更新1:

这是data_table的一个示例输出,用于加入code_table:

1, code1:Y, code2:3, code3:A, IA
2, code1:Y, code2:2, code3:B, IB
3, code1:N, code2:5, code3:C, IC

由于

2 个答案:

答案 0 :(得分:2)

使用TO_JSON_STRING函数是否提供了所需的输出?以下是使用您的数据的示例:

WITH `project.dataset.table` AS (
  SELECT 1 AS id, 'Y' AS code1, 3 AS code2, 'A' AS code3, 'IA' AS data1 UNION ALL
  SELECT 2, 'Y', 2, 'B', 'IB' UNION ALL
  SELECT 3, 'N', 5, 'C', 'IC'
)
SELECT TO_JSON_STRING(t) AS json
FROM `project.dataset.table` AS t;
+---------------------------------------------------------+
| json                                                    |
+---------------------------------------------------------+
| {"id":1,"code1":"Y","code2":3,"code3":"A","data1":"IA"} |
| {"id":2,"code1":"Y","code2":2,"code3":"B","data1":"IB"} |
| {"id":3,"code1":"N","code2":5,"code3":"C","data1":"IC"} |
+---------------------------------------------------------+

如果你想删除引号,你也可以这样做:

WITH `project.dataset.table` AS (
  SELECT 1 AS id, 'Y' AS code1, 3 AS code2, 'A' AS code3, 'IA' AS data1 UNION ALL
  SELECT 2, 'Y', 2, 'B', 'IB' UNION ALL
  SELECT 3, 'N', 5, 'C', 'IC'
)
SELECT REPLACE(TO_JSON_STRING(t), '"', '') AS json
FROM `project.dataset.table` AS t;
+-----------------------------------------+
| json                                    |
+-----------------------------------------+
| {id:1,code1:Y,code2:3,code3:A,data1:IA} |
| {id:2,code1:Y,code2:2,code3:B,data1:IB} |
| {id:3,code1:N,code2:5,code3:C,data1:IC} |
+-----------------------------------------+

编辑:这会给出准确的所需输出。我假设您可以通过名称引用iddata,因为听起来您不希望以相同的方式格式化它们。

WITH `project.dataset.table` AS (
  SELECT 1 AS id, 'Y' AS code1, 3 AS code2, 'A' AS code3, 'IA' AS data1 UNION ALL
  SELECT 2, 'Y', 2, 'B', 'IB' UNION ALL
  SELECT 3, 'N', 5, 'C', 'IC'
)
SELECT
  REGEXP_REPLACE(
    FORMAT(
      '%d %s %s',
      id,
      REGEXP_REPLACE(
        TO_JSON_STRING(
          (SELECT AS STRUCT t.* EXCEPT (id, data1))
        ),
        '["{}]', ''),
      data1
    ),
    r'[ ,]', ', '
  ) AS output 
FROM `project.dataset.table` AS t;
+----------------------------------+
| output                           |
+----------------------------------+
| 1, code1:Y, code2:3, code3:A, IA |
| 2, code1:Y, code2:2, code3:B, IB |
| 3, code1:N, code2:5, code3:C, IC |
+----------------------------------+

答案 1 :(得分:1)

   
#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 AS id, 'Y' AS code1, 3 AS code2, 'A' AS code3, 'IA' AS data1 UNION ALL
  SELECT 2, 'Y', 2, 'B', 'IB' UNION ALL
  SELECT 3, 'N', 5, 'C', 'IC'
)
SELECT 
  id, 
  MAX(IF(col = 1, val, NULL)) AS col1,
  MAX(IF(col = 2, val, NULL)) AS col2,
  MAX(IF(col = 3, val, NULL)) AS col3,
  data1
FROM `project.dataset.table` AS t, UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'"|{|}', ''))) AS val WITH OFFSET col
WHERE col BETWEEN 1 AND 3
GROUP BY id, data1
ORDER BY id   

输出如下

id  col1        col2        col3        data1    
1   code1:Y     code2:3     code3:A     IA   
2   code1:Y     code2:2     code3:B     IB   
3   code1:N     code2:5     code3:C     IC    

使用上述查询,您只需要知道代码列的数量,因此如果它是5(例如),您需要在SELECT中再添加两个,并将BETWEEN 1 AND 3更改为BETWEEN 1 AND 5 < / p>