我有两种不同的Google电子表格:
一个有4列
+------+------+------+------+
| Col1 | Col2 | Col5 | Col6 |
+------+------+------+------+
| ID1 | A | B | C |
| ID2 | D | E | F |
+------+------+------+------+
一个包含前一个文件的4列,另外两个列
+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+------+------+
| ID3 | G | H | J | K | L |
| ID4 | M | N | O | P | Q |
+------+------+------+------+------+------+
我在Google BigQuery中将它们配置为联合来源,现在我需要创建一个将连接两个表的数据的视图。
这两个表都有Col1
列,其中包含一个ID,此ID在表格中是唯一的,不包含复制数据。
我正在寻找的结果表如下:
+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+------+------+
| ID1 | A | NULL | NULL | B | C |
| ID2 | D | NULL | NULL | E | F |
| ID3 | G | H | J | K | L |
| ID4 | M | N | O | P | Q |
+------+------+------+------+------+------+
对于第一个文件没有的列,我期待NULL
值。
我正在使用standardSQL,这是一个可用于生成样本数据的语句:
#standardsQL
WITH table1 AS (
SELECT "A" as Col1, "B" as Col2, "C" AS Col3
UNION ALL
SELECT "D" as Col1, "E" as Col2, "F" AS Col3
),
table2 AS (
SELECT "G" as Col1, "H" as Col2, "J" AS Col3, "K" AS Col4, "L" AS Col5
UNION ALL
SELECT "M" as Col1, "N" as Col2, "O" AS Col3, "P" AS Col4, "Q" AS Col5
)
简单的UNION ALL
无效,因为表格有不同的列
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
Error: Queries in UNION ALL have mismatched column count; query 1 has 3 columns, query 2 has 5 columns at [17:1]
并且通配符运算符不适合,因为联合源不支持
SELECT * FROM `table*`
Error: External tables cannot be queried through prefix
当然这是一个样本数据,只有3-5列,真正的表有20-40列。这是一个我需要逐字段SELECT
显示的示例,这不是一个相当大的方法。
有没有一种方法可以加入这两个表?
答案 0 :(得分:3)
是否有工作方式加入这两个表?
#standardsQL
SELECT *, NULL AS Col5, NULL AS Col6 FROM table1
UNION ALL
SELECT * FROM table2
哟可以使用你的例子来检查这个
#standardsQL
WITH table1 AS (
SELECT "ID1" AS Col1, "A" AS Col2, "B" AS Col3, "C" AS Col4
UNION ALL
SELECT "ID2", "D", "E", "F"
),
table2 AS (
SELECT "ID3" Col1, "G" AS Col2, "H" AS Col3, "J" AS Col4, "K" AS Col5, "L" AS Col6
UNION ALL
SELECT "ID4", "M", "N", "O", "P", "Q"
)
SELECT *, NULL AS Col5, NULL AS Col6 FROM table1
UNION ALL
SELECT * FROM table2
答案 1 :(得分:3)
您可以通过UDF传递行来处理列名未按位置对齐的情况,或者表之间有不同数量的列。这是一个例子:
CREATE TEMP FUNCTION CoerceRow(json_row STRING)
RETURNS STRUCT<Col1 STRING, Col2 STRING, Col3 STRING, Col4 STRING, Col5 STRING>
LANGUAGE js AS """
return JSON.parse(json_row);
""";
WITH table1 AS (
SELECT "A" as Col5, "B" as Col3, "C" AS Col2
UNION ALL
SELECT "D" as Col5, "E" as Col3, "F" AS Col2
),
table2 AS (
SELECT "G" as Col1, "H" as Col2, "J" AS Col3, "K" AS Col4, "L" AS Col5
UNION ALL
SELECT "M" as Col1, "N" as Col2, "O" AS Col3, "P" AS Col4, "Q" AS Col5
)
SELECT CoerceRow(json_row).*
FROM (
SELECT TO_JSON_STRING(t1) AS json_row
FROM table1 AS t1
UNION ALL
SELECT TO_JSON_STRING(t2) AS json_row
FROM table2 AS t2
);
+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 |
+------+------+------+------+------+
| NULL | C | B | NULL | A |
| NULL | F | E | NULL | D |
| G | H | J | K | L |
| M | N | O | P | Q |
+------+------+------+------+------+
请注意,CoerceRow
函数需要在输出中声明所需的显式行类型。除此之外,正在联合的表中的列只是按名称匹配。