在SQL Update循环中动态设置具有多个版本的记录字段

时间:2015-04-20 16:26:15

标签: python sql postgresql datetime

我在Postgresql中解决SQL函数时遇到问题。我已经设法在python中完成了这项工作,但是在拥有数百万条记录的表格上需要很长时间。

我所拥有的是一个' example_table'结构如下,数据类似于下面的示例:

示例表

id  | version  | valid_from          | valid_to             | time_valid
1   | 1        | 2010-03-21 19:00:00 | 2010-03-21 19:00:00  | NULL
1   | 2        | 2011-02-02 09:00:00 | 2011-02-02 09:00:00  | NULL
1   | 3        | 2012-04-20 15:00:00 | 2012-04-20 15:00:00  | NULL
2   | 1        | 2012-07-02 04:00:00 | 2012-07-02 04:00:00  | NULL
3   | 1        | 2011-05-05 05:00:00 | 2011-05-05 05:00:00  | NULL`

正如您所看到的,我有3条记录,ID为#34; 1"每个都是相应的版本(即在这种情况下为1:3)

我想通过设置' valid_to'来更新版本2和1;等于' valid_from'在以后的版本中的值。

更新表

id | version  | valid_from             | valid_to                | time_valid
1  | 1        | 2010-03-21 19:00:00    | **2011-02-02 09:00:00** | **Some Time**
1  | 2        |**2011-02-02 09:00:00** | **2012-04-20 15:00:00** | **Some Time**
1  | 3        |**2012-04-20 15:00:00** | 2012-04-20 15:00:00     | NULL
2  | 1        |2012-07-02 04:00:00     | 2012-07-02 04:00:00     | NULL
3  | 1        |2011-05-05 05:00:00     | 2011-05-05 05:00:00     | NULL

有些记录会有很多版本,有些则可能没有(只有一个)。同时计算time_valid字段也很方便,我假设是通过减去valid_to和valid_from时间戳来完成的。我再次拥有数以百万计的记录和多个表格 - 所以更快更好。

非常感谢任何有效的代码示例!

根据请求,这里是我目前拥有的python代码。我已尝试使用limits,executemany,fetchmany,iterators对此播放进行了多次修改......但在所有情况下,它要么占用所有本地内存并且崩溃,要么非常缓慢:

cur.execute('''SELECT id, valid_from,valid_to, version FROM hist_line where valid_to = valid_from limit 10000;''')
for rec in cur.fetchall():
        r = 'SELECT id, valid_from, valid_to, version FROM hist_line WHERE id = %s and version = %s;' % (rec['id'],rec['version']+1)
        cur1.execute(r)
        r = cur1.fetchone()
        if r:
            out = {'id': rec['id'], 'valid_from':rec['valid_from'],'valid_to':r['valid_from'],'version':rec['version'],'time_valid':r[1]-rec[1]}
            cur1.execute('''UPDATE hist_line SET valid_to = %(valid_to)s
        WHERE id = %(id)s and version = %(version)s and valid_from = %(valid_from)s and valid_from = valid_to''', out)

1 个答案:

答案 0 :(得分:1)

Lead()lag()window-functions中的两个。它们允许您(在这种情况下)访问“上一个”或“下一个”记录,给定一定的顺序,您必须在OVER( ...)WINDOW( ...)子句中指定。

-- the data
CREATE TABLE ExampleTable
        (id INTEGER NOT NULL
        , version INTEGER NOT NULL
        , valid_from          TIMESTAMP NOT NULL
        , valid_to              TIMESTAMP NOT NULL
        , time_valid    text
        );

INSERT INTO ExampleTable(id, version, valid_from, valid_to, time_valid) VALUES
 (1  , 1        , '2010-03-21 19:00:00' , '2010-03-21 19:00:00'  ,NULL)
,(1   , 2        , '2011-02-02 09:00:00' , '2011-02-02 09:00:00' ,NULL)
,(1   , 3        , '2012-04-20 15:00:00' , '2012-04-20 15:00:00' ,NULL)
,(2   , 1        , '2012-07-02 04:00:00' , '2012-07-02 04:00:00' ,NULL)
,(3   , 1        , '2011-05-05 05:00:00' , '2011-05-05 05:00:00' ,NULL)
        ;

-- Check what the update will do
SELECT dst.* , src.lll AS newvalue
FROM ExampleTable dst
JOIN    (
        SELECT id,version
                , lead(valid_from) OVER (partition by id ORDER BY version) lll
        FROM ExampleTable
        ) src
ON src.id = dst.id AND src.version = dst.version
WHERE src.lll IS NOT NULL
ORDER BY id,version
        ;


-- Do the update (remove the explain if it looks okay)
EXPLAIN
UPDATE ExampleTable dst
        SET valid_to = src.lll
FROM    (
        SELECT id,version
                , lead(valid_from) OVER (partition by id ORDER BY version) lll
        FROM ExampleTable
        ) src
WHERE src.id = dst.id
AND src.version = dst.version
AND src.lll IS NOT NULL
        ;


SELECT * FROM ExampleTable
ORDER BY id,version
        ;