select sid, type, status, timestamp from contact_history limit 10;
sid | type | status | timestamp
---------+------+--------+-------------------------------
6291179 | 0 | 1025 | 2015-08-24 13:05:22.501025+02
68737 | 0 | 5 | 2015-08-24 13:05:32.500005+02
4987391 | 0 | 65 | 2015-08-24 13:05:35.500065+02
1189551 | 1 | 65 | 2015-08-24 13:06:05.510065+02
3374714 | 1 | 5 | 2015-07-27 13:25:25.510005+02
2297221 | 0 | 5 | 2015-07-27 13:25:48.500005+02
5503230 | 2 | 65 | 2015-07-27 13:25:50.520065+02
596992 | 1 | 65 | 2015-07-27 13:26:51.510065+02
5215455 | 0 | 1025 | 2015-07-27 13:27:21.501025+02
3011248 | 0 | 5 | 2015-07-27 13:27:46.500005+02
(10 rows)
\d contact_history
Table "contact_history"
Column | Type | Modifiers
---------------+--------------------------+----------------------------------------------------------
sid | character varying(32) | not null
type | integer | not null
status | integer | not null
timestamp | timestamp with time zone | not null
id | bigint | not null default nextval('contact_history_id_seq'::regcla
Indexes:
"contact_history_pk" PRIMARY KEY, btree (id)
"contact_history_sid_timestamp_idx" btree (sid, "timestamp")
当每个sid
在指定的type
达到某个status
和timestamp
时进行录制。没有uniq行。每个sid
都可以随时随机type
和status
。有2千万行。 PostgreSQL版本是9.3.13
现在我想知道sid
刚刚(type='0' or type='1') and status='5'
中有多少max(timestamp)
- > sid
。换句话说,每个timestamp
找到最后一个type
和相应的status
和(type='0' or type='1') and status='5'
,然后计算满足条件Field
的那些。所以我期待一个数字作为输出。其他更有效的方法也可以获得相同的结果。谢谢。
答案 0 :(得分:0)
感谢a_horse_with_no_name我遵循每组最大的路径。不幸的是,它有点不同。我做了一些猴子设计,到目前为止我得到了以下不同成本的查询:
EXPLAIN SELECT count(*) FROM contact_history t1 LEFT OUTER JOIN contact_history t2 ON (t1.sid = t2.sid AND t1.timestamp < t2.timestamp) WHERE t2.sid IS NULL and (t1.type=0 OR t1.type=1) and t1.status = '5';
QUERY PLAN
-----------------------------------------------------------------------------------------------
Aggregate (cost=158816.96..158816.97 rows=1 width=0)
-> Hash Anti Join (cost=66228.91..158003.37 rows=325435 width=0)
Hash Cond: ((t1.sid)::text = (t2.sid)::text)
Join Filter: (t1."timestamp" < t2."timestamp")
-> Seq Scan on contact_history t1 (cost=0.00..50771.93 rows=488152 width=15)
Filter: ((status = 5) AND ((type = 0) OR (type = 1)))
-> Hash (cost=39041.96..39041.96 rows=1563996 width=15)
-> Seq Scan on contact_history t2 (cost=0.00..39041.96 rows=1563996 width=15)
(8 rows)
EXPLAIN SELECT count(*) from contact_history as ch, (select sid, max(timestamp) as max_t from contact_history group by sid) as sub where ch.sid=sub.sid and ch.timestamp=sub.max_t and (type='0' or type='1') and status = '5';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Aggregate (cost=393277.11..393277.12 rows=1 width=0)
-> Merge Join (cost=366994.07..393277.10 rows=2 width=0)
Merge Cond: ((contact_history.sid)::text = (ch.sid)::text)
Join Filter: (ch."timestamp" = (max(contact_history."timestamp")))
-> GroupAggregate (cost=253411.17..267270.04 rows=212890 width=15)
-> Sort (cost=253411.17..257321.16 rows=1563996 width=15)
Sort Key: contact_history.sid
-> Seq Scan on contact_history (cost=0.00..39041.96 rows=1563996 width=15)
-> Materialize (cost=113582.90..116023.66 rows=488152 width=15)
-> Sort (cost=113582.90..114803.28 rows=488152 width=15)
Sort Key: ch.sid
-> Seq Scan on contact_history ch (cost=0.00..50771.93 rows=488152 width=15)
Filter: ((status = 5) AND ((type = 0) OR (type = 1)))
(13 rows)
EXPLAIN SELECT count(*) FROM contact_history as ch1 WHERE timestamp = (SELECT MAX(timestamp) FROM contact_history AS ch2 WHERE ch1.sid = ch2.sid) and (ch1.type='0' or ch1.type='1') and ch1.status = '5';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
---------------------------------------------------
Aggregate (cost=7919844.02..7919844.03 rows=1 width=0)
-> Seq Scan on contact_history ch1 (cost=0.00..7919837.92 rows=2441 width=0)
Filter: ((status = 5) AND ((type = 0) OR (type = 1)) AND ("timestamp" = (SubPlan 2)))
SubPlan 2
-> Result (cost=5.02..5.03 rows=1 width=0)
InitPlan 1 (returns $1)
-> Limit (cost=0.43..5.02 rows=1 width=8)
-> Index Only Scan Backward using contact_history_sid_timestamp_idx on cont
act_history ch2 (cost=0.43..32.57 rows=7 width=8)
Index Cond: ((sid = (ch1.sid)::text) AND ("timestamp" IS NOT NULL))
(9 rows)
一些改进,补充,评论或解释超过欢迎。谢谢。