我正在尝试重构我的大数据集,以便我可以更轻松地处理我的数据。我有大约20个表与显示的输入表具有相同的数据结构。从1996年到2015年,每年都有一个。
这是我的输入表格之一(mytable2015)
cell day1 day2 day3 day4 ...... day365
1 3,7167 0 0 0,1487 ...... 0,3256
2 0 0 0,2331 0,1461 ...... 1,8765
3 1,431 0,4121 0 1,4321 ...... 0
...
...
...
64800
我希望将所有数据放在一个大数据集中,如果可能的话,将day1,day2,...替换为实际日期值(例如01.01.2015或20150101) 所以我的结果应该是这样的:
cell date value
1 20150101 3,7167
1 20150102 0
1 20150103 0
1 20150104 0,1487
... ........ ......
... ........ ......
... ........ ......
2 20150101 0
2 20150102 0,4321
... ........ ......
... ........ ......
... ........ ......
64800 20150101 0,1035
Cell表示地理信息。它们代表了世界各地产生的网格,每个细胞正好高一度,一度长。
我有两个主要问题:
是否可以将day1,day2,...转换为日期格式?
如何将我的表格转换为这个新结构?
任何帮助都非常感谢,提前感谢!
答案 0 :(得分:3)
示例数据:
create table example2015 (cell int, day1 real, day2 real, day3 real, day4 real);
insert into example2015 values
(1, 3.7167, 0, 0, 0.1487),
(2, 0, 0, 0.2331, 0.1461),
(3, 1.431, 0.4121, 0, 1.4321);
一步一步如何构建查询。
步骤1.使用json_each(row_to_json(t))
汇总和删除列:
select cell, json_each_text(row_to_json(t)) val
from example2015 t
cell | val
------+---------------
1 | (cell,1)
1 | (day1,3.7167)
1 | (day2,0)
1 | (day3,0)
1 | (day4,0.1487)
2 | (cell,2)
2 | (day1,0)
2 | (day2,0)
2 | (day3,0.2331)
2 | (day4,0.1461)
3 | (cell,3)
3 | (day1,1.431)
3 | (day2,0.4121)
3 | (day3,0)
3 | (day4,1.4321)
(15 rows)
第2步。跳过cell
对,将dayn
转换为整数n
并添加到基准日期(此处为2014-12-31
):< / p>
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
select cell, json_each_text(row_to_json(t)) val
from example2015 t
) sub
where (val).key <> 'cell'
cell | date | value
------+------------+--------
1 | 2015-01-01 | 3.7167
1 | 2015-01-02 | 0
1 | 2015-01-03 | 0
1 | 2015-01-04 | 0.1487
2 | 2015-01-01 | 0
2 | 2015-01-02 | 0
2 | 2015-01-03 | 0.2331
2 | 2015-01-04 | 0.1461
3 | 2015-01-01 | 1.431
3 | 2015-01-02 | 0.4121
3 | 2015-01-03 | 0
3 | 2015-01-04 | 1.4321
(12 rows)
您可以使用步骤2中的查询将值从mytable2015
插入result_table
:
create table result_table (
"cell" integer,
"date" date,
"value" real
);
您将生成一个包含23,652,000行的表。 一次性转换可能耗尽内存资源,可能需要的时间比您接受的时间长。 我建议将操作分成几个阶段,比如说,一次最多10,000个源行(3,650,000个新行)。
insert into result_table
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
select cell, json_each_text(row_to_json(t)) val
from mytable2015 t
) sub
where (val).key <> 'cell'
and cell > 0 and cell <= 10000
重复cell > 10000 and cell <= 20000
的插入,依此类推。
答案 1 :(得分:2)
如果表格和列名称一致,您应该能够确定每个最终行的日期算法的日期,只需要每个表格的日期字面值,例如。&#39; 2011-01-01&#39;对于表mytable2011
大部分&#34; unpivot &#34;使用JSON进行操作,首先将每个源行放入JSON,然后根据以下阶段显示的行创建行。
PostgreSQL 9.3架构设置:
CREATE TABLE MyTable2011
("cell" int, "day1" numeric, "day2" numeric, "day3" int, "day4" numeric, "day365" int)
//
INSERT INTO MyTable2011
("cell", "day1", "day2", "day3", "day4", "day365")
VALUES
(1, 3.7167, 0.00, 0.00, 0.1487, 0.3256),
(2, 0, 0, 0.2331, 0.1461, 1.8765),
(3, 1.431, 0.4121, 0, 1.4321, 0.00)
//
查询1 :
SELECT row_to_json(MyTable2011) as jstring FROM MyTable2011
<强> Results 强>:
| jstring |
|-------------------------------------------------------------------------|
| {"cell":1,"day1":3.7167,"day2":0.00,"day3":0,"day4":0.1487,"day365":0} |
| {"cell":2,"day1":0,"day2":0,"day3":0,"day4":0.1461,"day365":2} |
| {"cell":3,"day1":1.431,"day2":0.4121,"day3":0,"day4":1.4321,"day365":0} |
查询2 :
SELECT
jstring->>'cell' as cell
, json_each_text(jstring) as pairs
FROM (
SELECT
row_to_json(MyTable2011) as jstring
FROM MyTable2011
) as jrows
<强> Results 强>:
| cell | pairs |
|------|---------------|
| 1 | (cell,1) |
| 1 | (day1,3.7167) |
| 1 | (day2,0.00) |
| 1 | (day3,0) |
| 1 | (day4,0.1487) |
| 1 | (day365,0) |
| 2 | (cell,2) |
| 2 | (day1,0) |
| 2 | (day2,0) |
| 2 | (day3,0) |
| 2 | (day4,0.1461) |
| 2 | (day365,2) |
| 3 | (cell,3) |
| 3 | (day1,1.431) |
| 3 | (day2,0.4121) |
| 3 | (day3,0) |
| 3 | (day4,1.4321) |
| 3 | (day365,0) |
查询3 :
SELECT
date '2011-01-01' + CAST(REPLACE((pairs).key,'day','') as integer) -1 as thedate
, CAST(REPLACE((pairs).key,'day','') as integer) as daynum
, cell
, (pairs).value as thevalue
FROM (
SELECT
jstring->>'cell' as cell
, json_each_text(jstring) as pairs
FROM (
SELECT
row_to_json(MyTable2011) as jstring
FROM MyTable2011
) as jrows
) as unpiv
WHERE (pairs).key <> 'cell'
<强> Results 强>:
| thedate | daynum | cell | thevalue |
|----------------------------|--------|------|----------|
| January, 01 2011 00:00:00 | 1 | 1 | 3.7167 |
| January, 02 2011 00:00:00 | 2 | 1 | 0.00 |
| January, 03 2011 00:00:00 | 3 | 1 | 0 |
| January, 04 2011 00:00:00 | 4 | 1 | 0.1487 |
| December, 31 2011 00:00:00 | 365 | 1 | 0 |
| January, 01 2011 00:00:00 | 1 | 2 | 0 |
| January, 02 2011 00:00:00 | 2 | 2 | 0 |
| January, 03 2011 00:00:00 | 3 | 2 | 0 |
| January, 04 2011 00:00:00 | 4 | 2 | 0.1461 |
| December, 31 2011 00:00:00 | 365 | 2 | 2 |
| January, 01 2011 00:00:00 | 1 | 3 | 1.431 |
| January, 02 2011 00:00:00 | 2 | 3 | 0.4121 |
| January, 03 2011 00:00:00 | 3 | 3 | 0 |
| January, 04 2011 00:00:00 | 4 | 3 | 1.4321 |
| December, 31 2011 00:00:00 | 365 | 3 | 0 |