不幸的是,在BQ中重塑它并不像在R中那么容易,我无法导出这个项目的数据。
这是输入
date country A B C D
20170928 CH 3000.3 121 13 3200
20170929 CH 2800.31 137 23 1614.31
预期输出
date country Metric Value
20170928 CH A 3000.3
20170928 CH B 121
20170928 CH C 13
20170928 CH D 3200
20170929 CH A 2800.31
20170929 CH B 137
20170929 CH C 23
20170929 CH D 1614.31
我的表还有更多的列和行(但我假设需要很多手册)
答案 0 :(得分:4)
下面是BigQuery Standard SQL,不需要重复选择取决于列数。它将选择尽可能多的数量并将其转换为指标和值
#standardSQL
SELECT DATE, country,
metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT DATE, country,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM `project.dataset.yourtable` t,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('date', 'country')
您可以像在问题中一样使用虚拟数据进行上述测试/播放
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT '20170928' DATE, 'CH' country, 3000.3 A, 121 B, 13 C, 3200 D UNION ALL
SELECT '20170929', 'CH', 2800.31, 137, 23, 1614.31
)
SELECT DATE, country,
metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT DATE, country,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM `project.dataset.yourtable` t,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('date', 'country')
结果符合预期
DATE country metric value
20170928 CH A 3000.3
20170928 CH B 121.0
20170928 CH C 13.0
20170928 CH D 3200.0
20170929 CH A 2800.31
20170929 CH B 137.0
20170929 CH C 23.0
20170929 CH D 1614.31
答案 1 :(得分:2)
您需要使用bigquery中的逗号表示UNION
SELECT date, country, Metric, Value
FROM (
SELECT date, country, 'A' as Metric, A as Value FROM your_table
), (
SELECT date, country, 'B' as Metric, B as Value FROM your_table
), (
SELECT date, country, 'C' as Metric, C as Value FROM your_table
) , (
SELECT date, country, 'D' as Metric, D as Value FROM your_table
)
答案 2 :(得分:1)
我设法找到的大多数答案都需要指定要熔化的每个列的名称。当表中有成百上千的列时,这很难处理。这是一个适用于任意宽表的答案。
它使用动态SQL并自动从数据模式中提取多个列名称,整理命令字符串,然后评估该字符串。旨在模仿Python pandas.melt()/ R reshape2 :: melt()行为。
由于UDF的某些不良特性,我故意没有创建用户定义的函数。根据您的使用方式,您可能会或可能不想这样做。
输入:
id0 id1 _2020_05_27 _2020_05_28
1 1 11 12
1 2 13 14
2 1 15 16
2 2 17 18
输出:
id0 id1 date value
1 2 _2020_05_27 13
1 2 _2020_05_28 14
2 2 _2020_05_27 17
2 2 _2020_05_28 18
1 1 _2020_05_27 11
1 1 _2020_05_28 12
2 1 _2020_05_27 15
2 1 _2020_05_28 16
#standardSQL
-- PANDAS MELT FUNCTION IN GOOGLE BIGQUERY
-- author: Luna Huang
-- email: lunahuang@google.com
-- run this script with Google BigQuery Web UI in the Cloud Console
-- this piece of code functions like the pandas melt function
-- pandas.melt(id_vars, value_vars, var_name, value_name, col_level=None)
-- without utilizing user defined functions (UDFs)
-- see below for where to input corresponding arguments
DECLARE cmd STRING;
DECLARE subcmd STRING;
SET cmd = ("""
WITH original AS (
-- query to retrieve the original table
%s
),
nested AS (
SELECT
[
-- sub command to be automatically generated
%s
] as s,
-- equivalent to id_vars in pandas.melt()
%s,
FROM original
)
SELECT
-- equivalent to id_vars in pandas.melt()
%s,
-- equivalent to var_name in pandas.melt()
s.key AS %s,
-- equivalent to value_name in pandas.melt()
s.value AS %s,
FROM nested
CROSS JOIN UNNEST(nested.s) AS s
""");
SET subcmd = ("""
WITH
columns AS (
-- query to retrieve the column names
-- equivalent to value_vars in pandas.melt()
-- the resulting table should have only one column
-- with the name: column_name
%s
),
scs AS (
SELECT FORMAT("STRUCT('%%s' as key, %%s as value)", column_name, column_name) AS sc
FROM columns
)
SELECT ARRAY_TO_STRING(ARRAY (SELECT sc FROM scs), ",\\n")
""");
-- -- -- EXAMPLE BELOW -- -- --
-- SET UP AN EXAMPLE TABLE --
CREATE OR REPLACE TABLE `tmp.example`
(
id0 INT64,
id1 INT64,
_2020_05_27 INT64,
_2020_05_28 INT64,
);
INSERT INTO `tmp.example` VALUES (1, 1, 11, 12);
INSERT INTO `tmp.example` VALUES (1, 2, 13, 14);
INSERT INTO `tmp.example` VALUES (2, 1, 15, 16);
INSERT INTO `tmp.example` VALUES (2, 2, 17, 18);
-- MELTING STARTS --
-- execute these two command to melt the table
-- the first generates the STRUCT commands
-- and saves a string in subcmd
EXECUTE IMMEDIATE FORMAT(
-- please do not change this argument
subcmd,
-- query to retrieve the column names
-- equivalent to value_vars in pandas.melt()
-- the resulting table should have only one column
-- with the name: column_name
"""
SELECT column_name
FROM `tmp.INFORMATION_SCHEMA.COLUMNS`
WHERE (table_name = "example") AND (column_name NOT IN ("id0", "id1"))
"""
) INTO subcmd;
-- the second implements the melting
EXECUTE IMMEDIATE FORMAT(
-- please do not change this argument
cmd,
-- query to retrieve the original table
"""
SELECT *
FROM `tmp.example`
""",
-- please do not change this argument
subcmd,
-- equivalent to id_vars in pandas.melt()
-- !!please type these twice!!
"id0, id1", "id0, id1",
-- equivalent to var_name in pandas.melt()
"date",
-- equivalent to value_name in pandas.melt()
"value"
);