我在PostgreSQL数据库中有一个名为feeds_up
的表。它看起来像:
| feed_url | isup | hasproblems | observed timestamp with tz | id (pk)|
|----------|------|-------------|-------------------------------|--------|
| http://b.| t | f | 2013-02-27 16:34:46.327401+11 | 15235 |
| http://f.| f | t | 2013-02-27 16:31:25.415126+11 | 15236 |
它有300k行,每五分钟增长约20行。我有一个经常运行的查询(每个页面加载)
select distinct on (feed_url) feed_url, isUp, hasProblems
from feeds_up
where observed <= '2013-02-27T05:38:00.000Z'
order by feed_url, observed desc;
我在那里放了一个例子,那个时间是参数化的。解释分析在explain.depesz.com。它需要 8s 。疯狂!
feed_url
只有大约20个唯一值,所以这看起来效率很低。我以为我会变傻并在函数中尝试FOR循环。
CREATE OR REPLACE FUNCTION feedStatusAtDate(theTime timestamp with time zone) RETURNS SETOF feeds_up AS
$BODY$
DECLARE
url feeds_list%rowtype;
BEGIN
FOR url IN SELECT * FROM feeds_list
LOOP
RETURN QUERY SELECT * FROM feeds_up
WHERE observed <= theTime
AND feed_url = url.feed_url
ORDER BY observed DESC LIMIT 1;
END LOOP;
END;
$BODY$ language plpgsql;
select * from feedStatusAtDate('2013-02-27T05:38:00.000Z');
只需 307ms !
在SQL中使用FOR循环以错误的方式使用,我怎样才能创建一个好的查询 - 就像第一个一样 - 这是有效的?那可能吗?或者这是FOR循环真的最好的东西吗?
ETA
Postgres版本:i686-pc-linux-gnu上的PostgreSQL 9.1.5,由gcc编译(SUSE Linux)4.3.4 [gcc-4_3-branch revision 152973],32位
feed_up上的索引:
CREATE INDEX feeds_up_url
ON feeds_up
USING btree
(feed_url COLLATE pg_catalog."default");
CREATE INDEX feeds_up_url_observed
ON feeds_up
USING btree
(feed_url COLLATE pg_catalog."default", observed DESC);
CREATE INDEX feeds_up_observed
ON public.feeds_up
USING btree
(observed DESC);
答案 0 :(得分:1)
假设“id”是串行的并且始终是顺序的,您可以通过在子查询中找到每个feed_url的MAX(id)来简化,然后按如下方式提取其余数据:
SELECT fu.feed_url, fu.isup, fu.hasproblems, fu.observed
FROM feeds_up fu
JOIN
(
SELECT feed_url, max(id) AS id FROM feeds_up
WHERE observed <= '2013-03-27T05:38:00.000Z'
GROUP BY feed_url
) AS q USING (id);
ORDER BY fu.feed_url, fu.observed desc;
我做了一个快速测试,这非常有效地利用了“观察”的索引。
更新:
要使用“observe”代替“id”(因为记录可能无法按顺序插入),您可以按如下方式修改上述查询:
SELECT DISTINCT ON (fu.feed_url) fu.feed_url, fu.isup, fu.hasproblems, fu.observed
FROM feeds_up fu
JOIN
(
SELECT feed_url, max(observed) as observed FROM feeds_up
WHERE observed <= '2013-03-27T05:38:00.000Z'
GROUP BY feed_url
) AS q USING (feed_url, observed)
ORDER BY fu.feed_url, fu.observed desc;
在我的系统上,这与“观察”的一个索引几乎同时运行。 YMMV
答案 1 :(得分:0)