大数据集(PostgreSQL)中的行中的列--Transponse?

时间:2015-10-12 13:00:58

标签: postgresql data-structures bigdata crosstab

我正在尝试重构我的大数据集,以便我可以更轻松地处理我的数据。我有大约20个表与显示的输入表具有相同的数据结构。从1996年到2015年,每年都有一个。

这是我的输入表格之一(mytable2015)

cell   day1      day2      day3      day4    ......   day365
1      3,7167    0         0         0,1487  ......   0,3256
2      0         0         0,2331    0,1461  ......   1,8765
3      1,431     0,4121    0         1,4321  ......   0
...
...
...
64800

我希望将所有数据放在一个大数据集中,如果可能的话,将day1,day2,...替换为实际日期值(例如01.01.2015或20150101) 所以我的结果应该是这样的:

cell   date      value
1      20150101  3,7167
1      20150102  0
1      20150103  0
1      20150104  0,1487
...    ........  ......
...    ........  ......
...    ........  ......
2      20150101  0
2      20150102  0,4321
...    ........  ......
...    ........  ......
...    ........  ......
64800  20150101  0,1035

Cell表示地理信息。它们代表了世界各地产生的网格,每个细胞正好高一度,一度长。

我有两个主要问题:

  1. 是否可以将day1,day2,...转换为日期格式?

  2. 如何将我的表格转换为这个新结构?

  3. 任何帮助都非常感谢,提前感谢!

2 个答案:

答案 0 :(得分:3)

查询

示例数据:

create table example2015 (cell int, day1 real, day2 real, day3 real, day4 real);
insert into example2015 values
(1,      3.7167,    0,         0,         0.1487),  
(2,      0,         0,         0.2331,    0.1461),  
(3,      1.431,     0.4121,    0,         1.4321);  

一步一步如何构建查询。

步骤1.使用json_each(row_to_json(t)) 汇总和删除列:

select cell, json_each_text(row_to_json(t)) val
from example2015 t

 cell |      val      
------+---------------
    1 | (cell,1)
    1 | (day1,3.7167)
    1 | (day2,0)
    1 | (day3,0)
    1 | (day4,0.1487)
    2 | (cell,2)
    2 | (day1,0)
    2 | (day2,0)
    2 | (day3,0.2331)
    2 | (day4,0.1461)
    3 | (cell,3)
    3 | (day1,1.431)
    3 | (day2,0.4121)
    3 | (day3,0)
    3 | (day4,1.4321)
(15 rows)   

第2步。跳过cell对,将dayn转换为整数n并添加到基准日期(此处为2014-12-31):< / p>

select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
    select cell, json_each_text(row_to_json(t)) val
    from example2015 t
    ) sub
where (val).key <> 'cell'

 cell |    date    | value  
------+------------+--------
    1 | 2015-01-01 | 3.7167
    1 | 2015-01-02 |      0
    1 | 2015-01-03 |      0
    1 | 2015-01-04 | 0.1487
    2 | 2015-01-01 |      0
    2 | 2015-01-02 |      0
    2 | 2015-01-03 | 0.2331
    2 | 2015-01-04 | 0.1461
    3 | 2015-01-01 |  1.431
    3 | 2015-01-02 | 0.4121
    3 | 2015-01-03 |      0
    3 | 2015-01-04 | 1.4321
(12 rows)

转换

您可以使用步骤2中的查询将值从mytable2015插入result_table

create table result_table (
    "cell" integer,
    "date" date,
    "value" real
);

您将生成一个包含23,652,000行的表。 一次性转换可能耗尽内存资源,可能需要的时间比您接受的时间长。 我建议将操作分成几个阶段,比如说,一次最多10,000个源行(3,650,000个新行)。

insert into result_table
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
    select cell, json_each_text(row_to_json(t)) val
    from mytable2015 t
    ) sub
where (val).key <> 'cell'
and cell > 0 and cell <= 10000

重复cell > 10000 and cell <= 20000的插入,依此类推。

答案 1 :(得分:2)

如果表格和列名称一致,您应该能够确定每个最终行的日期算法的日期,只需要每个表格的日期字面值,例如。&#39; 2011-01-01&#39;对于表mytable2011

大部分&#34; unpivot &#34;使用JSON进行操作,首先将每个源行放入JSON,然后根据以下阶段显示的行创建行。

SQL Fiddle

PostgreSQL 9.3架构设置

CREATE TABLE MyTable2011
    ("cell" int, "day1" numeric, "day2" numeric, "day3" int, "day4" numeric, "day365" int)
//

INSERT INTO MyTable2011
    ("cell", "day1", "day2", "day3", "day4", "day365")
VALUES
    (1, 3.7167, 0.00, 0.00, 0.1487, 0.3256),
    (2, 0, 0, 0.2331, 0.1461, 1.8765),
    (3, 1.431, 0.4121, 0, 1.4321, 0.00)
//

查询1

SELECT row_to_json(MyTable2011) as jstring FROM MyTable2011

<强> Results

|                                                                 jstring |
|-------------------------------------------------------------------------|
|  {"cell":1,"day1":3.7167,"day2":0.00,"day3":0,"day4":0.1487,"day365":0} |
|          {"cell":2,"day1":0,"day2":0,"day3":0,"day4":0.1461,"day365":2} |
| {"cell":3,"day1":1.431,"day2":0.4121,"day3":0,"day4":1.4321,"day365":0} |

查询2

SELECT
      jstring->>'cell' as cell
    , json_each_text(jstring) as pairs
     FROM (
           SELECT
                row_to_json(MyTable2011) as jstring 
           FROM MyTable2011
          ) as jrows

<强> Results

| cell |         pairs |
|------|---------------|
|    1 |      (cell,1) |
|    1 | (day1,3.7167) |
|    1 |   (day2,0.00) |
|    1 |      (day3,0) |
|    1 | (day4,0.1487) |
|    1 |    (day365,0) |
|    2 |      (cell,2) |
|    2 |      (day1,0) |
|    2 |      (day2,0) |
|    2 |      (day3,0) |
|    2 | (day4,0.1461) |
|    2 |    (day365,2) |
|    3 |      (cell,3) |
|    3 |  (day1,1.431) |
|    3 | (day2,0.4121) |
|    3 |      (day3,0) |
|    3 | (day4,1.4321) |
|    3 |    (day365,0) |

查询3

SELECT
      date '2011-01-01' + CAST(REPLACE((pairs).key,'day','') as integer) -1 as thedate
    , CAST(REPLACE((pairs).key,'day','') as integer) as daynum
    , cell
    , (pairs).value as thevalue 
FROM (
      SELECT
            jstring->>'cell' as cell
          , json_each_text(jstring) as pairs
     FROM (
           SELECT
                row_to_json(MyTable2011) as jstring 
           FROM MyTable2011
          ) as jrows
     ) as unpiv
WHERE (pairs).key <> 'cell'

<强> Results

|                    thedate | daynum | cell | thevalue |
|----------------------------|--------|------|----------|
|  January, 01 2011 00:00:00 |      1 |    1 |   3.7167 |
|  January, 02 2011 00:00:00 |      2 |    1 |     0.00 |
|  January, 03 2011 00:00:00 |      3 |    1 |        0 |
|  January, 04 2011 00:00:00 |      4 |    1 |   0.1487 |
| December, 31 2011 00:00:00 |    365 |    1 |        0 |
|  January, 01 2011 00:00:00 |      1 |    2 |        0 |
|  January, 02 2011 00:00:00 |      2 |    2 |        0 |
|  January, 03 2011 00:00:00 |      3 |    2 |        0 |
|  January, 04 2011 00:00:00 |      4 |    2 |   0.1461 |
| December, 31 2011 00:00:00 |    365 |    2 |        2 |
|  January, 01 2011 00:00:00 |      1 |    3 |    1.431 |
|  January, 02 2011 00:00:00 |      2 |    3 |   0.4121 |
|  January, 03 2011 00:00:00 |      3 |    3 |        0 |
|  January, 04 2011 00:00:00 |      4 |    3 |   1.4321 |
| December, 31 2011 00:00:00 |    365 |    3 |        0 |