Question

我有两张桌子：

DROP TABLE IF EXISTS `left_table`;
CREATE TABLE `left_table` (
  `l_id` INT(11) NOT NULL AUTO_INCREMENT,
  `l_curr_time` INT(11) NOT NULL,
  PRIMARY KEY(l_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `right_table`;
CREATE TABLE `right_table` (
  `r_id` INT(11) NOT NULL AUTO_INCREMENT,
  `r_curr_time` INT(11) NOT NULL,
  PRIMARY KEY(r_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO left_table(l_curr_time) VALUES
(3),(4),(6),(10),(13);

INSERT INTO right_table(r_curr_time) VALUES
(1),(5),(7),(8),(11),(12);

我想将r_curr_time中距离right_table的两个最接近的l_curr_time映射到left_table的每个r_curr_time，以便l_curr_time必须大于或等于+------+-------------+-------------+ | l_id | l_curr_time | r_curr_time | +------+-------------+-------------+ | 1 | 3 | 5 | | 1 | 3 | 7 | | 2 | 4 | 5 | | 2 | 4 | 7 | | 3 | 6 | 7 | | 3 | 6 | 8 | | 4 | 10 | 11 | | 4 | 10 | 12 | +------+-------------+-------------+。

给定值的预期结果应为：

GROUP BY

我有以下解决方案，适用于一个最接近的值。但是我不太喜欢它，因为它默默地依赖于SELECT l_id, l_curr_time, r_curr_time, time_diff FROM ( SELECT *, ABS(r_curr_time - l_curr_time) AS time_diff FROM left_table JOIN right_table ON 1=1 WHERE r_curr_time >= l_curr_time ORDER BY l_id ASC, time_diff ASC ) t GROUP BY l_id;将是第一次出现在群组中的事实：

+------+-------------+-------------+-----------+
| l_id | l_curr_time | r_curr_time | time_diff |
+------+-------------+-------------+-----------+
|    1 |           3 |           5 |         2 |
|    2 |           4 |           5 |         1 |
|    3 |           6 |           7 |         1 |
|    4 |          10 |          11 |         1 |
+------+-------------+-------------+-----------+
4 rows in set (0.00 sec)

输出如下：

JOIN ON 1=1

正如您所看到的那样left_table这对于大数据也是可以的（例如，如果right_table和JOIN ON 1=1都有10000行，那么笛卡尔积将是10 ^ 8长）？尽管存在这种不足，但事情translateX是唯一可行的解决方案，因为首先我需要从现有表中创建所有可能的组合，然后选择满足条件的组合，但如果我错了，请纠正我。感谢。

Answer 1

这个问题并非无足轻重。在SQL Server或postgrsql中，由于row_number() over x语句，它将非常容易。这在mysql中不存在。在mysql中，你必须处理变量和链式选择语句。

要解决此问题，您必须组合多个概念。我将尝试一个接一个地解释它们，以找到适合您问题的解决方案。

让我们轻松开始：如何构建包含left_table和right_table信息的表？

使用联接。在此特定问题中，左连接和连接条件我们设置l_curr_time必须小于r_curr_time。为了简化其余部分，我们按l_curr_time和r_curr_time订购此表格。声明如下：

SELECT l_id, l_curr_time, r_curr_time
FROM left_table l
LEFT JOIN right_table r ON l.l_curr_time<r.r_curr_time
ORDER BY l.l_curr_time, r.r_curr_time;

现在我们有一个有序的表并包含我们想要的信息......但是它们太多了;）因为表是有序的，所以如果mysql只为每个值选择两个第一个出现的行，那将是惊人的l_curr_time。这是不可能的。我们必须由自己做到这一点

mid part：如何编号行？

使用变量！如果要对表进行编号，可以使用mysql变量。有两件事要做：首先，我们必须声明和定义变量。其次，我们必须增加这个变量。我们假设我们有一个名字表，我们想知道所有名字在我们按名称订购时的位置：

SELECT name, @num:=@num+1 /* increment */
FROM table t, (SELECT @num:=0) as c
ORDER BY name ASC;

难点：如何根据一个字段的值对行的子集进行编号？

使用变量来计算（看一下上面）和变量用于状态模式。我们使用与上面相同的原理，但现在我们采用一个变量并保存我们想要的字段的值。如果值改变，我们将计数器变量重置为零。再次：必须声明和定义第二个变量。新部分：根据状态变量的内容重置不同的变量：

SELECT
  l_id,
  l_curr_time,
  r_curr_time,
  @num := IF( /* (re)set num (the counter)... */
    @l_curr_time = l_curr_time,
    @num:= @num + 1, /* increment if the variable equals the actual l_curr_time field value */
    1 /* reset to 1 if the values are not equal */
  ) as row_num,
  @l_curr_time:=l_curr_time as lct /* state variable that holds the l_curr_time value */
FROM ( /* table from Step 1 of the explanation */
  SELECT l_id, l_curr_time, r_curr_time
  FROM left_table l
  LEFT JOIN right_table r ON l.l_curr_time<r.r_curr_time
  ORDER BY l.l_curr_time, r.r_curr_time
) as joinedTable

现在我们有一个表可以包含我们想要的所有组合（但是太多），所有行都根据l_curr_time字段的值进行编号。换句话说：每个子集的编号从1到匹配的r_curr_time值的数量大于或等于l_curr_time。

同样简单的部分：选择我们想要的所有值并根据行号

这部分很容易。因为我们在3.中创建的表是有序的和编号的，我们可以按数字过滤（它必须小于或等于2）。此外，我们只选择我们所关注的列：

SELECT l_id, l_curr_time, r_curr_time, row_num
FROM ( /* table from step 3. */
  SELECT
    l_id,
    l_curr_time,
    r_curr_time,
    @num := IF(
      @l_curr_time = l_curr_time,
      @num:= @num + 1,
      1
    ) as row_num,
    @l_curr_time:=l_curr_time as lct
  FROM (
    SELECT l_id, l_curr_time, r_curr_time
    FROM left_table l
    LEFT JOIN right_table r ON l.l_curr_time<r.r_curr_time
    ORDER BY l.l_curr_time, r.r_curr_time
  ) as joinedTable
) as numberedJoinedTable,(
  SELECT @l_curr_time:='',@num:=0 /* define the state variable and the number variable */
) as counterTable
HAVING row_num<=2; /* the number has to be smaller or equal to 2 */

那就是它。此语句准确返回您想要的内容。您可以在此sqlfiddle中查看此声明。

Answer 2

JoshuaK有正确的想法。我只是认为它可以更简洁地表达......

怎么样：

SELECT n.l_id
     , n.l_curr_time
     , n.r_curr_time
  FROM 
     ( SELECT a.*
            , CASE WHEN @prev = l_id THEN @i:=@i+1 ELSE @i:=1 END i
            , @prev := l_id prev
         FROM 
            ( SELECT l.*
                   , r.r_curr_time
                FROM left_table l
                JOIN right_table r
                  ON r.r_curr_time >= l.l_curr_time
            ) a
         JOIN 
            ( SELECT @prev := null,@i:=0) vars
        ORDER 
           BY l_id,r_curr_time
     ) n
 WHERE i<=2;

从一个表中找到两个最接近的元素到另一个表中的其他元素

2 个答案: