我在Postgresql中解决SQL函数时遇到问题。我已经设法在python中完成了这项工作,但是在拥有数百万条记录的表格上需要很长时间。
我所拥有的是一个' example_table'结构如下,数据类似于下面的示例:
id | version | valid_from | valid_to | time_valid
1 | 1 | 2010-03-21 19:00:00 | 2010-03-21 19:00:00 | NULL
1 | 2 | 2011-02-02 09:00:00 | 2011-02-02 09:00:00 | NULL
1 | 3 | 2012-04-20 15:00:00 | 2012-04-20 15:00:00 | NULL
2 | 1 | 2012-07-02 04:00:00 | 2012-07-02 04:00:00 | NULL
3 | 1 | 2011-05-05 05:00:00 | 2011-05-05 05:00:00 | NULL`
正如您所看到的,我有3条记录,ID为#34; 1"每个都是相应的版本(即在这种情况下为1:3)
我想通过设置' valid_to'来更新版本2和1;等于' valid_from'在以后的版本中的值。
id | version | valid_from | valid_to | time_valid
1 | 1 | 2010-03-21 19:00:00 | **2011-02-02 09:00:00** | **Some Time**
1 | 2 |**2011-02-02 09:00:00** | **2012-04-20 15:00:00** | **Some Time**
1 | 3 |**2012-04-20 15:00:00** | 2012-04-20 15:00:00 | NULL
2 | 1 |2012-07-02 04:00:00 | 2012-07-02 04:00:00 | NULL
3 | 1 |2011-05-05 05:00:00 | 2011-05-05 05:00:00 | NULL
有些记录会有很多版本,有些则可能没有(只有一个)。同时计算time_valid字段也很方便,我假设是通过减去valid_to和valid_from时间戳来完成的。我再次拥有数以百万计的记录和多个表格 - 所以更快更好。
非常感谢任何有效的代码示例!
根据请求,这里是我目前拥有的python代码。我已尝试使用limits,executemany,fetchmany,iterators对此播放进行了多次修改......但在所有情况下,它要么占用所有本地内存并且崩溃,要么非常缓慢:
cur.execute('''SELECT id, valid_from,valid_to, version FROM hist_line where valid_to = valid_from limit 10000;''')
for rec in cur.fetchall():
r = 'SELECT id, valid_from, valid_to, version FROM hist_line WHERE id = %s and version = %s;' % (rec['id'],rec['version']+1)
cur1.execute(r)
r = cur1.fetchone()
if r:
out = {'id': rec['id'], 'valid_from':rec['valid_from'],'valid_to':r['valid_from'],'version':rec['version'],'time_valid':r[1]-rec[1]}
cur1.execute('''UPDATE hist_line SET valid_to = %(valid_to)s
WHERE id = %(id)s and version = %(version)s and valid_from = %(valid_from)s and valid_from = valid_to''', out)
答案 0 :(得分:1)
Lead()
和lag()
是window-functions中的两个。它们允许您(在这种情况下)访问“上一个”或“下一个”记录,给定一定的顺序,您必须在OVER( ...)
或WINDOW( ...)
子句中指定。
-- the data
CREATE TABLE ExampleTable
(id INTEGER NOT NULL
, version INTEGER NOT NULL
, valid_from TIMESTAMP NOT NULL
, valid_to TIMESTAMP NOT NULL
, time_valid text
);
INSERT INTO ExampleTable(id, version, valid_from, valid_to, time_valid) VALUES
(1 , 1 , '2010-03-21 19:00:00' , '2010-03-21 19:00:00' ,NULL)
,(1 , 2 , '2011-02-02 09:00:00' , '2011-02-02 09:00:00' ,NULL)
,(1 , 3 , '2012-04-20 15:00:00' , '2012-04-20 15:00:00' ,NULL)
,(2 , 1 , '2012-07-02 04:00:00' , '2012-07-02 04:00:00' ,NULL)
,(3 , 1 , '2011-05-05 05:00:00' , '2011-05-05 05:00:00' ,NULL)
;
-- Check what the update will do
SELECT dst.* , src.lll AS newvalue
FROM ExampleTable dst
JOIN (
SELECT id,version
, lead(valid_from) OVER (partition by id ORDER BY version) lll
FROM ExampleTable
) src
ON src.id = dst.id AND src.version = dst.version
WHERE src.lll IS NOT NULL
ORDER BY id,version
;
-- Do the update (remove the explain if it looks okay)
EXPLAIN
UPDATE ExampleTable dst
SET valid_to = src.lll
FROM (
SELECT id,version
, lead(valid_from) OVER (partition by id ORDER BY version) lll
FROM ExampleTable
) src
WHERE src.id = dst.id
AND src.version = dst.version
AND src.lll IS NOT NULL
;
SELECT * FROM ExampleTable
ORDER BY id,version
;