我有一个包含以下结构(标题)的表:
ProjID,Cost2001,Cost2002,Cost2003
所以例如前两行看起来像这样:
projectA,10,32,30
projectB,42,22,122
我想将此表转换为以下结构(标题):
ProjID,CostYear,Value
因此回到示例数据,发布转换后,它会是这样的:
ProjectA,Cost2001,10
ProjectA,Cost2002,32
ProjectA,Cost2003,30
ProjectB,Cost2001,42
ProjectB,Cost2002,22
ProjectB,Cost2003,122
我该怎么做?我正在使用支持Standard SQL的Google BigQuery。我只需要这样做一次来修复表,所以我不介意将数据导入另一个RDBMS以便能够使用数据透视功能。
答案 0 :(得分:3)
恕我直言,最简单的方法是使用UNION ALL。
CREATE TABLE projects(ProjID VARCHAR(20), Cost2001 int, Cost2002 int, Cost2003 int); INSERT INTO projects VALUES ('projectA', 10, 32, 30), ('projectB', 42, 22, 122); CREATE TABLE new_projects (ProjID VARCHAR(20), ProjYear INT, Cost int); GO
2 rows affected
INSERT INTO new_projects SELECT ProjID, 2001, Cost2001 FROM projects UNION ALL SELECT ProjID, 2002, Cost2002 FROM projects UNION ALL SELECT ProjID, 2003, Cost2003 FROM projects; SELECT * FROM new_projects; GO
ProjID | ProjYear | Cost :------- | -------: | ---: projectA | 2001 | 10 projectB | 2001 | 42 projectA | 2002 | 32 projectB | 2002 | 22 projectA | 2003 | 30 projectB | 2003 | 122
dbfiddle here
答案 1 :(得分:3)
以下是真正的BigQuery风格:o)
BigQuery Standard SQL的两个版本
##standardSQL
SELECT
projID,
([2001, 2002, 2003])[SAFE_OFFSET(pos)] year,
cost
FROM `project.dataset.table`,
UNNEST([Cost2001,Cost2002,Cost2003]) cost WITH OFFSET pos
你可以使用你问题中的虚拟数据测试/播放上面的智慧,如下所示
##standardSQL
WITH `project.dataset.table` AS (
SELECT 'projectA' projID, 10 Cost2001, 32 Cost2002, 30 Cost2003 UNION ALL
SELECT 'projectB', 42, 22, 122
)
SELECT
projID,
([2001, 2002, 2003])[SAFE_OFFSET(pos)] year,
cost
FROM `project.dataset.table`,
UNNEST([Cost2001,Cost2002,Cost2003]) cost WITH OFFSET pos
结果为
Row projID year cost
1 projectA 2001 10
2 projectA 2002 32
3 projectA 2003 30
4 projectB 2001 42
5 projectB 2002 22
6 projectB 2003 122
正如您在上面的查询中所看到的,您必须在下面的行
中预设相应年份的值([2001, 2002, 2003])[SAFE_OFFSET(pos)] year
如果出于某种原因,您希望更通用并且能够从原始列的名称中获取这些值 - 您可以使用以下通用方法
##standardSQL
WITH `project.dataset.table` AS (
SELECT 'projectA' projID, 10 Cost2001, 32 Cost2002, 30 Cost2003 UNION ALL
SELECT 'projectB', 42, 22, 122
)
SELECT
projID,
SPLIT(x,':')[SAFE_OFFSET(0)] year,
SPLIT(x,':')[SAFE_OFFSET(1)] cost
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) x
WHERE SPLIT(x,':')[OFFSET(0)] != 'projID'
显然,结果相同
Row projID year cost
1 projectA Cost2001 10
2 projectA Cost2002 32
3 projectA Cost2003 30
4 projectB Cost2001 42
5 projectB Cost2002 22
6 projectB Cost2003 122
答案 2 :(得分:1)
正如大家所提到的,你需要UNPIVOT
,这看起来像是:
DECLARE @projects TABLE (projid nvarchar(max), cost2001 int, cost2002 int, cost2003 int);
INSERT @projects VALUES ('projectA', 10, 32, 30)
, ('projectB', 42, 22, 122);
SELECT PROJID, PROJECT_ATTRIBUTE, PROJECT_COST
FROM @projects
UNPIVOT (PROJECT_COST FOR PROJECT_ATTRIBUTE in (cost2001, cost2002, cost2003) ) AS UNPVT
出于性能原因,我不会使用UNION ALL
版本。从本质上讲,你会扫描桌子3次,或者扫描桌子的次数是多少" CostYear"您拥有的列,另外还必须为此添加一个全新的查询。
与使用UNPIVOT
扫描表格相反。
答案 3 :(得分:0)
取消投资数据
DECLARE @ProjectTbl TABLE (ProjectID VARCHAR(225),Cost2001 INT, Cost2002 INT,Cost2003 INT)
INSERT INTO @ProjectTbl VALUES
('projectA',10,32,30),
('projectB',42,22,122);
;WITH Unpivots
AS
(SELECT
*
FROM @ProjectTbl
UNPIVOT
(
Value FOR CostYear IN (Cost2001, Cost2002, Cost2003)
) AS up
)
SELECT
ProjectID,
CostYear,
Value
FROM Unpivots
输出
ProjectID CostYear Value
projectA Cost2001 10
projectA Cost2002 32
projectA Cost2003 30
projectB Cost2001 42
projectB Cost2002 22
projectB Cost2003 122