如何解决这个MySQL查询?

时间:2011-05-04 04:04:32

标签: mysql sql

我有一个看起来像这样的表:

    CREATE TEMPORARY TABLE MainList (
  `pTime` int(10) unsigned NOT NULL,
  `STD` double NOT NULL,
  PRIMARY KEY (`pTime`)
) ENGINE=MEMORY;


+------------+-------------+
| pTime      | STD         |
+------------+-------------+
| 1106080500 |  -0.5058072 |
| 1106081100 | -0.82790455 |
| 1106081400 | -0.59226294 |
| 1106081700 | -0.99998194 |
| 1106540100 | -0.86649279 |
| 1107194700 |  1.51340543 |
| 1107305700 |  0.96225296 |
| 1107306300 |  0.53937716 |
+------------+-------------+ .. etc

pTime是我的主要关键。

我想进行一个查询,对于我表中的每一行,都会找到第一个pTime,其中STD有一个翻转符号,并且比上表的STD更远离0。 (为简单起见,想象一下我正在寻找0-STD)

以下是我想要的输出示例:

+------------+-------------+------------+-------------+
| pTime      | STD         | pTime_Oppo | STD_Oppo    |
+------------+-------------+------------+-------------+
| 1106080500 |  -0.5058072 | 1106090400 |  0.57510881 |
| 1106081100 | -0.82790455 | 1106091300 |  0.85599817 |
| 1106081400 | -0.59226294 | 1106091300 |  0.85599817 |
| 1106081700 | -0.99998194 | 1106091600 |  1.0660959  |
+------------+-------------+------------+-------------+

我似乎无法做对! 我尝试了以下方法:

SELECT DISTINCT
    MainList.pTime,
    MainList.STD,
    b34d1.pTime,
    b34d1.STD
FROM
    MainList
JOIN b34d1 ON(
    b34d1.pTime > MainList.pTime
    AND(
        (
            MainList.STD > 0
            AND b34d1.STD <= 0 - MainList.STD
        )
        OR(
            MainList.STD < 0
            AND b34d1.STD >= 0 - MainList.STD
        )
    )
);

该代码只是冻结了我的服务器。

P.S表b34d1与MainList类似,不同之处在于它包含更多元素:

mysql>  select STD, Slope from b31d1 limit 10;
+-------------+--------------+
| STD         | Slope        |
+-------------+--------------+
| -0.44922675 |   -5.2016129 |
| -0.11892021 |  -8.15249267 |
|  0.62574686 | -10.19794721 |
|  1.10469057 | -12.43768328 |
|  1.52917352 | -13.08651026 |
|  1.61803899 |  -13.2441349 |
|  1.82686555 | -12.04912023 |
|  2.07480736 | -11.22067449 |
|  2.45529961 |  -7.84090909 |
|  1.86468335 |  -6.26466276 |
+-------------+--------------+
mysql>  select count(*) from b31d1;
+----------+
| count(*) |
+----------+
|   439340 |
+----------+

1行(0.00秒)

实际上MainList只是使用MEMORY引擎的b34d1的过滤版本

mysql> show create table b34d1;
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
| Table | Create Table
                                                                                                       |
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
| b34d1 | CREATE TABLE `b34d1` (
  `pTime` int(10) unsigned NOT NULL,
  `Slope` double NOT NULL,
  `STD` double NOT NULL,
  PRIMARY KEY (`pTime`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MIN_ROWS=339331 MAX_ROWS=539331 PACK_KEYS=1 ROW_FORMAT=FIXED |
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+

编辑:我刚做了一个小实验,我对结果非常困惑:

SELECT DISTINCT
    b34d1.pTime,
    b34d1.STD,
    Anti.pTime,
    Anti.STD

FROM
    b34d1

LEFT JOIN b34d1 As Anti ON(
    Anti.pTime > b34d1.pTime
    AND(
        (
            b34d1.STD > 0
            AND b34d1.STD <= 0 - Anti.STD
        )
        OR(
            b34d1.STD < 0
            AND b34d1.STD >= 0 - Anti.STD
        )
    )
)  limit 10;

+------------+-------------+------------+------------+
| pTime      | STD         | pTime      | STD        |
+------------+-------------+------------+------------+
| 1104537600 | -0.70381962 | 1104539100 | 0.73473692 |
| 1104537600 | -0.70381962 | 1104714000 | 1.46733274 |
| 1104537600 | -0.70381962 | 1104714300 | 2.02097356 |
| 1104537600 | -0.70381962 | 1104714600 | 2.60642099 |
| 1104537600 | -0.70381962 | 1104714900 | 2.01006557 |
| 1104537600 | -0.70381962 | 1104715200 | 1.97724189 |
| 1104537600 | -0.70381962 | 1104715500 | 1.85683704 |
| 1104537600 | -0.70381962 | 1104715800 |  1.2754127 |
| 1104537600 | -0.70381962 | 1104716100 | 0.87900156 |
| 1104537600 | -0.70381962 | 1104716400 | 0.72957739 |
+------------+-------------+------------+------------+

为什么第一个pTime下的所有值都相同?

3 个答案:

答案 0 :(得分:1)

从具有某些聚合统计信息(例如最小值或最大值)的行中选择其他字段在SQL中有点混乱。这样的查询并不那么简单。您通常需要额外的连接或子查询。例如:

SELECT m.pTime, m.STD, m2.pTime AS pTime_Oppo, m2.STD AS STD_Oppo
  FROM MainList AS m
    JOIN 
      (SELECT m1.pTime, MIN(m2.pTime) AS pTime_Oppo
         FROM MainList AS m1
           JOIN MainList AS m2 
             ON m1.pTime < m2.pTime AND SIGN(m1.STD) != SIGN(m2.STD)
         WHERE ABS(m1.STD) <= ABS(m2.std)
         GROUP BY m1.pTime
      ) AS oppo ON m.pTime = oppo.pTime
    JOIN MainList AS m2 ON oppo.pTime_Oppo = m2.pTime
;

使用样本数据:

INSERT INTO MainList (`pTime`, `STD`)
  VALUES
(1106080500, -0.5058072),
(1106081100, -0.82790455),
(1106081400, -0.59226294),
(1106081700, -0.99998194),
(1106090400,  0.57510881),
(1106091300,  0.85599817),
(1106091600,  1.0660959),
(1106540100, -0.86649279),
(1107194700,  1.51340543),
(1107305700,  0.96225296),
(1107306300,  0.53937716),
;

结果是:

+------------+-------------+------------+-------------+
| pTime      | STD         | pTime_Oppo | STD_Oppo    |
+------------+-------------+------------+-------------+
| 1106080500 |  -0.5058072 | 1106090400 |  0.57510881 |
| 1106081100 | -0.82790455 | 1106091300 |  0.85599817 |
| 1106081400 | -0.59226294 | 1106091300 |  0.85599817 |
| 1106081700 | -0.99998194 | 1106091600 |   1.0660959 |
| 1106090400 |  0.57510881 | 1106540100 | -0.86649279 |
| 1106091300 |  0.85599817 | 1106540100 | -0.86649279 |
| 1106540100 | -0.86649279 | 1107194700 |  1.51340543 |
+------------+-------------+------------+-------------+

答案 1 :(得分:0)

任何基于ABS或SIGN等功能的解决方案或检查签名所需的任何类似物都注定对大数据集无效,因为它无法建立索引。

您正在SP中创建一个临时表,这样您就可以在不丢失任何内容的情况下更改它的架构,添加一个存储STD符号的列并将STD本身存储为unsigned将为您提供巨大的性能提升,因为您可以简单地找到第一个更大的pTime和更大的STD具有不同的符号和所有条件可以在这样的查询中使用索引(STD_positive保持STD的符号):

SELECT * from mainlist m
LEFT JOIN mainlist mu 
ON mu.pTime = ( SELECT md.pTime FROM mainlist md 
            WHERE m.pTime < md.pTime
            AND m.STD < md.STD
            AND m.STD_positive <> md.STD_positive
            ORDER BY md.pTime
            LIMIT 1 ) 

这里需要LEFT JOIN来返回没有更大STD的行。如果您不需要它们,请使用简单的JOIN。即使在很多记录上,这个查询也应该运行正常,基于仔细检查EXPLAIN输出的正确索引,从STD索引开始。

答案 2 :(得分:0)

SELECT
  m.pTime,
  m.STD,
  mo.pTime AS pTime_Oppo,
  -mo.STD AS STD_Oppo
FROM MainList m
  INNER JOIN (
    SELECT
      pTime,
      -STD AS STD
    FROM MainList
  ) mo ON m.STD > 0 AND mo.STD > m.STD
       OR m.STD < 0 AND mo.STD < m.STD
  LEFT JOIN (
    SELECT
      pTime,
      -STD AS STD
    FROM MainList
  ) mo2 ON mo.STD > 0 AND mo2.STD > m.STD AND mo.STD > mo2.STD
        OR mo.STD < 0 AND mo2.STD < m.STD AND mo.STD < mo2.STD
WHERE mo2.pTime IS NULL