我有两个表,如下(从实际中简化):
mysql> desc small_table; +-----------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------+------+-----+---------+-------+ | event_time | datetime | NO | | NULL | | | user_id | char(15) | NO | | NULL | | | other_data | int(11) | NO | MUL | NULL | | +-----------------+---------------+------+-----+---------+-------+ 3 rows in set (0.00 sec) mysql> desc large_table; +-----------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------+------+-----+---------+-------+ | event_time | datetime | NO | | NULL | | | user_id | char(15) | NO | | NULL | | | other_data | int(11) | NO | | NULL | | +-----------------+---------------+------+-----+---------+-------+ 3 rows in set (0.00 sec)
现在,small_table
很小:对于每个user_id
,通常只有一行(尽管有时更多)。另一方面,在large_table
中,每个user_id
出现多次。
mysql> select count(1) from small_table\G *************************** 1. row *************************** count(1): 20182 1 row in set (0.00 sec) mysql> select count(1) from large_table\G *************************** 1. row *************************** count(1): 2870522 1 row in set (0.00 sec)
但是,这很重要,对于small_table
中的每一行,large_table
中至少有一行具有相同的user_id
,相同的other_data
,以及类似的event_time
(在几分钟内也是如此)。
我想知道small_table
是否有一行对应于large_table
中的第一个或第二个或哪个 th 不同的行{{1}和类似的user_id
。也就是说,我喜欢:
event_time
,按user_id
按顺序计算large_table
的不同行数,但仅限于event_time
,例如三小时;也就是说,我只搜索event_time
这样的行数,例如,彼此相隔三个小时;和event_time
按顺序排列)的标识在event_time
中具有相应的行。我似乎甚至无法编写将执行第一步的查询,更不用说会执行第二步的查询,并且会欣赏任何方向。
答案 0 :(得分:0)
select count(s.user_id), s.event_time, s.other_data from small_table s
where s.user_id IN (select distinct user_id from big_table where event_time between 'StartDate' and 'EndDate')
order by s.event_time
我不确定你提到的小幅度要求。
也:
select * from large_table t1, large_table t2
where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour)
所以,试试:
select count(s.user_id), s.event_time, s.other_data from small_table s
where s.user_id IN ( select * from large_table t1, large_table t2
where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour))
order by s.event_time
答案 1 :(得分:0)
这应该是对Jonathan Leffler的detailed and helpful answer的评论,但(a)它太长了,(b)它确实有助于回答我的问题,所以我将其作为答案发布。
Jonathan Leffler的答案中标题为“Multiple Event Ranges”的代码找到第二个实例在第一个实例之后不久的范围,而倒数第二个实例在最后一个实例之前不久,并且没有出现大的中断,但是内部之间存在任何大的差距实例,即使它们之间存在其他实例。因此,例如,如果限制为3小时,则由于2和6之间的差距,将禁止1,2,4,6和7的实例。我认为正确的代码将是(直接建立在Jonathan Leffler的):
SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
FROM Large_Table AS lt1
JOIN Large_Table AS lt2
ON lt1.user_id = lt2.user_id
AND lt1.event_time < lt2.event_time
WHERE NOT EXISTS -- an earlier event that is close enough
(SELECT *
FROM Large_Table AS lt3
WHERE lt1.user_id = lt3.user_id
AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
AND lt3.event_time < lt1.event_time
)
AND NOT EXISTS -- a later event that is close enough
(SELECT *
FROM Large_Table AS lt4
WHERE lt1.user_id = lt4.user_id
AND lt4.event_time > lt2.event_time
AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
)
AND NOT EXISTS -- a gap that's too big in the events between first and last
(SELECT *
FROM Large_Table AS lt5 -- E5 before E6
JOIN Large_Table AS lt6
ON lt5.user_id = lt6.user_id
AND lt1.user_id = lt5.user_id
AND lt5.event_time < lt6.event_time
AND lt6.event_time <= lt2.event_time
AND lt5.event_time >= lt1.event_time
AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
and not exists (
select * from large_table as lt9
where lt9.event_time > lt5.event_time
and lt6.event_time > lt9.event_time
)
)
在Jonathan Leffler的答案中避免了对标题为“Multiple Event Ranges”的代码中最后两个and exists
的需要,实际上,不需要“Singleton range”和“Doubleton range”代码。他的回答。
除非我遗漏了什么。