Postgresql - 查询由另一列的代理索引的列

时间:2014-07-11 19:42:37

标签: sql performance postgresql processing-efficiency

我有以下查询:

SELECT SUM(data), foreign_key
FROM (SELECT * 
    FROM really_big_table
    ORDER BY auto_incremented_id DESC
    LIMIT reasonable_number)
WHERE inserted_timestamp > now() - INTERVAL '1 hour'
GROUP BY foreign_key

此查询成功避免在inserted_timestamp上运行顺序扫描,但如果我需要检索的行数超过合理数量,则完全失败。由于inserted_timestamp没有被索引,但是遵循与auto_incremented_id相同的顺序,我觉得我可以使这个查询更有效率,而不会导致一小时的停机时间创建新索引。

我想做这样的事情:

SELECT SUM(data), foreign_key
FROM really_big_table
ORDER BY id DESC
STOP WHEN created < now() - INTERVAL '1 hour'
GROUP BY foreign_key

换句话说,我想要语法,以便我的查询将运行我的表的索引扫描,并在数据太旧时停止。

1 个答案:

答案 0 :(得分:1)

加快速度的一种可能性是使用table partitioning,如果你还没有这样做的话。

这是另一个想法:

BEGIN;
DECLARE my_cursor NO SCROLL CURSOR FOR
    SELECT data, foreign_key, inserted_timestamp
    FROM really_big_table
    ORDER BY id DESC;
FETCH FORWARD 5 FROM my_cursor;
-- Repeat as many times as you want
CLOSE my_cursor;
ROLLBACK; -- Or COMMIT

计算应用程序中的总和,或者,如果您想在数据库中执行此操作:

CREATE FUNCTION my_fetch() RETURNS SETOF really_big_table AS $$
DECLARE
    -- You could also select only the relevant columns here and change
    -- the function's return type.
    curs CURSOR FOR
        SELECT * FROM really_big_table ORDER BY id DESC;
BEGIN
    FOR current_row IN curs LOOP
        IF current_row.inserted_timestamp > CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN
            RETURN NEXT current_row;
        ELSE
            RETURN;
        END IF;
    END LOOP;
    RETURN;
END
$$ STABLE LANGUAGE plpgsql;

然后你可以这样做:

SELECT SUM(data), foreign_key FROM my_fetch() GROUP BY foreign_key;