Question

我有两个表，如下（从实际中简化）：

mysql> desc small_table;
+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| event_time      | datetime      | NO   |     | NULL    |       |
| user_id         | char(15)      | NO   |     | NULL    |       |
| other_data      | int(11)       | NO   | MUL | NULL    |       |
+-----------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> desc large_table;
+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| event_time      | datetime      | NO   |     | NULL    |       |
| user_id         | char(15)      | NO   |     | NULL    |       |
| other_data      | int(11)       | NO   |     | NULL    |       |
+-----------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

现在，small_table很小：对于每个user_id，通常只有一行（尽管有时更多）。另一方面，在large_table中，每个user_id出现多次。

mysql> select count(1) from small_table\G
*************************** 1. row ***************************
count(1): 20182
1 row in set (0.00 sec)


mysql> select count(1) from large_table\G
*************************** 1. row ***************************
count(1): 2870522
1 row in set (0.00 sec)

但是，这很重要，对于small_table中的每一行，large_table中至少有一行具有相同的user_id，相同的other_data，以及类似的event_time（在几分钟内也是如此）。

我想知道small_table是否有一行对应于large_table中的第一个或第二个或哪个^th不同的行{{1}和类似的user_id。也就是说，我喜欢：

对于每个event_time，按user_id按顺序计算large_table的不同行数，但仅限于event_time，例如三小时;也就是说，我只搜索event_time这样的行数，例如，彼此相隔三个小时;和
对于每个此类不同行的集合，该列表中哪一行（按event_time按顺序排列）的标识在event_time中具有相应的行。

我似乎甚至无法编写将执行第一步的查询，更不用说会执行第二步的查询，并且会欣赏任何方向。

Answer 1

select count(s.user_id), s.event_time, s.other_data from small_table s
where s.user_id IN (select distinct user_id from big_table where event_time between 'StartDate' and 'EndDate')
order by s.event_time

我不确定你提到的小幅度要求。

也：

select * from large_table t1, large_table t2 
where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour)

所以，试试：

  select count(s.user_id), s.event_time, s.other_data from small_table s
    where s.user_id IN ( select * from large_table t1, large_table t2 
    where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour))
order by s.event_time

Answer 2

这应该是对Jonathan Leffler的detailed and helpful answer的评论，但（a）它太长了，（b）它确实有助于回答我的问题，所以我将其作为答案发布。

Jonathan Leffler的答案中标题为“Multiple Event Ranges”的代码找到第二个实例在第一个实例之后不久的范围，而倒数第二个实例在最后一个实例之前不久，并且没有出现大的中断，但是内部之间存在任何大的差距实例，即使它们之间存在其他实例。因此，例如，如果限制为3小时，则由于2和6之间的差距，将禁止1,2,4,6和7的实例。我认为正确的代码将是（直接建立在Jonathan Leffler的）：

SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
  FROM Large_Table AS lt1
  JOIN Large_Table AS lt2
    ON lt1.user_id = lt2.user_id
   AND lt1.event_time < lt2.event_time
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt2.event_time
           AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
       )
   AND NOT EXISTS -- a gap that's too big in the events between first and last
       (SELECT *
          FROM Large_Table AS lt5 -- E5 before E6
          JOIN Large_Table AS lt6
            ON lt5.user_id = lt6.user_id
           AND lt1.user_id = lt5.user_id
           AND lt5.event_time < lt6.event_time
           AND lt6.event_time <= lt2.event_time
           AND lt5.event_time >= lt1.event_time
           AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
           and not exists (
             select * from large_table as lt9 
               where lt9.event_time > lt5.event_time
                 and lt6.event_time > lt9.event_time
             )
       )

在Jonathan Leffler的答案中避免了对标题为“Multiple Event Ranges”的代码中最后两个and exists的需要，实际上，不需要“Singleton range”和“Doubleton range”代码。他的回答。

除非我遗漏了什么。

表中哪一行（按某些列的顺序）对应另一个表中的一行？

2 个答案: