SQL转置和添加列

时间:2018-03-05 15:30:00

标签: sql google-bigquery

我有一个包含以下结构(标题)的表:

ProjID,Cost2001,Cost2002,Cost2003

所以例如前两行看起来像这样:

projectA,10,32,30
projectB,42,22,122

我想将此表转换为以下结构(标题):

ProjID,CostYear,Value

因此回到示例数据,发布转换后,它会是这样的:

ProjectA,Cost2001,10
ProjectA,Cost2002,32
ProjectA,Cost2003,30
ProjectB,Cost2001,42
ProjectB,Cost2002,22
ProjectB,Cost2003,122

我该怎么做?我正在使用支持Standard SQL的Google BigQuery。我只需要这样做一次来修复表,所以我不介意将数据导入另一个RDBMS以便能够使用数据透视功能。

4 个答案:

答案 0 :(得分:3)

恕我直言,最简单的方法是使用UNION ALL。

CREATE TABLE projects(ProjID VARCHAR(20), Cost2001 int, Cost2002 int, Cost2003 int);
INSERT INTO projects VALUES
('projectA', 10, 32, 30),
('projectB', 42, 22, 122);

CREATE TABLE new_projects (ProjID VARCHAR(20), ProjYear INT, Cost int);
GO
2 rows affected
INSERT INTO new_projects
SELECT ProjID, 2001, Cost2001
FROM   projects
UNION ALL
SELECT ProjID, 2002, Cost2002
FROM   projects
UNION ALL
SELECT ProjID, 2003, Cost2003
FROM   projects;

SELECT * FROM new_projects;
GO
ProjID   | ProjYear | Cost
:------- | -------: | ---:
projectA |     2001 |   10
projectB |     2001 |   42
projectA |     2002 |   32
projectB |     2002 |   22
projectA |     2003 |   30
projectB |     2003 |  122

dbfiddle here

答案 1 :(得分:3)

以下是真正的BigQuery风格:o)

BigQuery Standard SQL的两个版本

  
##standardSQL
SELECT 
  projID,  
  ([2001, 2002, 2003])[SAFE_OFFSET(pos)] year, 
  cost
FROM `project.dataset.table`,
UNNEST([Cost2001,Cost2002,Cost2003]) cost WITH OFFSET pos

你可以使用你问题中的虚拟数据测试/播放上面的智慧,如下所示

##standardSQL
WITH `project.dataset.table` AS (
  SELECT 'projectA' projID, 10 Cost2001, 32 Cost2002, 30 Cost2003 UNION ALL
  SELECT 'projectB', 42, 22, 122 
)
SELECT 
  projID,  
  ([2001, 2002, 2003])[SAFE_OFFSET(pos)] year, 
  cost
FROM `project.dataset.table`,
UNNEST([Cost2001,Cost2002,Cost2003]) cost WITH OFFSET pos   

结果为

Row projID      year    cost     
1   projectA    2001    10   
2   projectA    2002    32   
3   projectA    2003    30   
4   projectB    2001    42   
5   projectB    2002    22   
6   projectB    2003    122    

正如您在上面的查询中所看到的,您必须在下面的行

中预设相应年份的值
([2001, 2002, 2003])[SAFE_OFFSET(pos)] year   

如果出于某种原因,您希望更通用并且能够从原始列的名称中获取这些值 - 您可以使用以下通用方法

##standardSQL
WITH `project.dataset.table` AS (
  SELECT 'projectA' projID, 10 Cost2001, 32 Cost2002, 30 Cost2003 UNION ALL
  SELECT 'projectB', 42, 22, 122 
)
SELECT 
  projID,  
  SPLIT(x,':')[SAFE_OFFSET(0)] year,
  SPLIT(x,':')[SAFE_OFFSET(1)] cost 
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) x
WHERE SPLIT(x,':')[OFFSET(0)] != 'projID'
显然,结果相同

Row projID      year        cost     
1   projectA    Cost2001    10   
2   projectA    Cost2002    32   
3   projectA    Cost2003    30   
4   projectB    Cost2001    42   
5   projectB    Cost2002    22   
6   projectB    Cost2003    122  

答案 2 :(得分:1)

正如大家所提到的,你需要UNPIVOT,这看起来像是:

DECLARE @projects TABLE (projid nvarchar(max), cost2001 int, cost2002 int, cost2003 int);

INSERT @projects VALUES ('projectA', 10, 32, 30)
                    , ('projectB', 42, 22, 122);


SELECT PROJID, PROJECT_ATTRIBUTE, PROJECT_COST
FROM @projects
UNPIVOT (PROJECT_COST FOR PROJECT_ATTRIBUTE in (cost2001, cost2002, cost2003) ) AS UNPVT

出于性能原因,我不会使用UNION ALL版本。从本质上讲,你会扫描桌子3次,或者扫描桌子的次数是多少" CostYear"您拥有的列,另外还必须为此添加一个全新的查询。

与使用UNPIVOT扫描表格相反。

答案 3 :(得分:0)

取消投资数据

DECLARE @ProjectTbl TABLE (ProjectID VARCHAR(225),Cost2001 INT, Cost2002 INT,Cost2003  INT)
INSERT INTO @ProjectTbl VALUES
('projectA',10,32,30),
('projectB',42,22,122);

;WITH Unpivots 
AS
(SELECT 
*
FROM @ProjectTbl
UNPIVOT 
(
  Value FOR CostYear IN (Cost2001, Cost2002, Cost2003)
) AS up 
)
SELECT
ProjectID,
CostYear,
Value
FROM Unpivots

输出

ProjectID   CostYear    Value
projectA    Cost2001    10
projectA    Cost2002    32
projectA    Cost2003    30
projectB    Cost2001    42
projectB    Cost2002    22
projectB    Cost2003    122