表中哪一行(按某些列的顺序)对应另一个表中的一行?

时间:2012-01-13 17:42:53

标签: mysql sql-order-by mysql-5.1

我有两个表,如下(从实际中简化):

mysql> desc small_table;
+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| event_time      | datetime      | NO   |     | NULL    |       |
| user_id         | char(15)      | NO   |     | NULL    |       |
| other_data      | int(11)       | NO   | MUL | NULL    |       |
+-----------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> desc large_table;
+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| event_time      | datetime      | NO   |     | NULL    |       |
| user_id         | char(15)      | NO   |     | NULL    |       |
| other_data      | int(11)       | NO   |     | NULL    |       |
+-----------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

现在,small_table很小:对于每个user_id,通常只有一行(尽管有时更多)。另一方面,在large_table中,每个user_id出现多次。

mysql> select count(1) from small_table\G
*************************** 1. row ***************************
count(1): 20182
1 row in set (0.00 sec)


mysql> select count(1) from large_table\G
*************************** 1. row ***************************
count(1): 2870522
1 row in set (0.00 sec)

但是,这很重要,对于small_table中的每一行,large_table中至少有一行具有相同的user_id,相同的other_data,以及类似的event_time(在几分钟内也是如此)。

我想知道small_table是否有一行对应于large_table中的第一个或第二个或哪个 th 不同的行{{1}和类似的user_id。也就是说,我喜欢:

  1. 对于每个event_time,按user_id按顺序计算large_table的不同行数,但仅限于event_time,例如三小时;也就是说,我只搜索event_time这样的行数,例如,彼此相隔三个小时;和
  2. 对于每个此类不同行的集合,该列表中哪一行(按event_time按顺序排列)的标识在event_time中具有相应的行。
  3. 我似乎甚至无法编写将执行第一步的查询,更不用说会执行第二步的查询,并且会欣赏任何方向。

2 个答案:

答案 0 :(得分:0)

select count(s.user_id), s.event_time, s.other_data from small_table s
where s.user_id IN (select distinct user_id from big_table where event_time between 'StartDate' and 'EndDate')
order by s.event_time

我不确定你提到的小幅度要求。

也:

select * from large_table t1, large_table t2 
where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour)

所以,试试:

  select count(s.user_id), s.event_time, s.other_data from small_table s
    where s.user_id IN ( select * from large_table t1, large_table t2 
    where t1.event_time <= date_sub(t2.event_time, INTERVAL 3 hour))
order by s.event_time

答案 1 :(得分:0)

这应该是对Jonathan Lefflerdetailed and helpful answer的评论,但(a)它太长了,(b)它确实有助于回答我的问题,所以我将其作为答案发布。

Jonathan Leffler的答案中标题为“Multiple Event Ranges”的代码找到第二个实例在第一个实例之后不久的范围,而倒数第二个实例在最后一个实例之前不久,并且没有出现大的中断,但是内部之间存在任何大的差距实例,即使它们之间存在其他实例。因此,例如,如果限制为3小时,则由于2和6之间的差距,将禁止1,2,4,6和7的实例。我认为正确的代码将是(直接建立在Jonathan Leffler的):

SELECT lt1.user_id, lt1.event_time AS min_time, lt2.event_time AS max_time
  FROM Large_Table AS lt1
  JOIN Large_Table AS lt2
    ON lt1.user_id = lt2.user_id
   AND lt1.event_time < lt2.event_time
 WHERE NOT EXISTS -- an earlier event that is close enough
       (SELECT *
          FROM Large_Table AS lt3
         WHERE lt1.user_id = lt3.user_id
           AND lt3.event_time > lt1.event_time - 3 UNITS HOUR
           AND lt3.event_time < lt1.event_time
       )
   AND NOT EXISTS -- a later event that is close enough
       (SELECT *
          FROM Large_Table AS lt4
         WHERE lt1.user_id = lt4.user_id
           AND lt4.event_time > lt2.event_time
           AND lt4.event_time < lt2.event_time + 3 UNITS HOUR
       )
   AND NOT EXISTS -- a gap that's too big in the events between first and last
       (SELECT *
          FROM Large_Table AS lt5 -- E5 before E6
          JOIN Large_Table AS lt6
            ON lt5.user_id = lt6.user_id
           AND lt1.user_id = lt5.user_id
           AND lt5.event_time < lt6.event_time
           AND lt6.event_time <= lt2.event_time
           AND lt5.event_time >= lt1.event_time
           AND (lt6.event_time - lt5.event_time) > 3 UNITS HOUR
           and not exists (
             select * from large_table as lt9 
               where lt9.event_time > lt5.event_time
                 and lt6.event_time > lt9.event_time
             )
       )

在Jonathan Leffler的答案中避免了对标题为“Multiple Event Ranges”的代码中最后两个and exists的需要,实际上,不需要“Singleton range”和“Doubleton range”代码。他的回答。

除非我遗漏了什么。