如何使用唯一列值作为另一个select语句的输入

时间:2012-05-26 00:03:17

标签: mysql stored-procedures query-optimization

我有一个表(MySQL),它有一个名为binID的列。此列中的值范围为1到70。

我想要做的是选择此列的唯一值(应该是1到70之间的数字),然后使用每个(将其称为theBinID)作为参数迭代到另一个SELECT语句中,例如:< / p>

SELECT * FROM MyTable WHERE binID = theBinID ORDER BY createdDate DESC LIMIT 10

基本上,我希望为每个binID获取10个最新行。

我不相信有一种方法可以用一个基本的SQL语句来做到这一点,虽然我希望这是一个答案,所以我编写了一个存储过程来创建一个游标选择binIDs的DISTINCT,然后迭代它并填充临时表。

我的问题是,这是为了优化,如果我获取100K行,我得到1.7秒的平均时间。执行我的存储过程以获得700行(70个分区的10个记录)需要1.4秒。我意识到0.3秒可以被认为是相当大的改进,但我希望得到100K行的亚秒级。

有更好的方法吗?

完整的存储过程如下:

BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE binID INT;
DECLARE cur1 CURSOR FOR SELECT DISTINCT heatmapBinID from MEStressTest ORDER BY heatmapBinID ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

DROP TEMPORARY TABLE IF EXISTS TempResults;

CREATE TEMPORARY TABLE TempResults (
    `recordID` text NOT NULL,
    `queryTerm` text NOT NULL,
    `recordCreated` double(11,0) NOT NULL,
    `recordByID` text NOT NULL,
    `recordByName` text NOT NULL,
    `recordText` text NOT NULL,
    `recordSource` text NOT NULL,
    `rerecordCount` int(11) NOT NULL DEFAULT '0',
    `timecodeOffset` int(11) NOT NULL DEFAULT '-1',
    `recordByImageURL` text NOT NULL,
    `canDelete` int(11) NOT NULL DEFAULT '1',
    `heatmapBinID` int(11) DEFAULT NULL,
    `timelineBinID` int(11) DEFAULT NULL,
    PRIMARY KEY (`recordID`(20))
);

OPEN cur1;

read_loop: LOOP
    FETCH cur1 INTO binID;

    IF done THEN
        LEAVE read_loop;
    END IF;

    INSERT INTO TempResults (recordID, queryTerm, recordCreated, recordByID, recordByName, recordText, recordSource, rerecordCount, timecodeOffset, recordByImageURL, canDelete, heatmapBinID, timelineBinID)
    SELECT * FROM MEStressTest WHERE heatmapBinID = binID ORDER BY recordCreated DESC LIMIT numRecordsPerBin;
END LOOP;

CLOSE cur1;

SELECT * FROM TempResults ORDER BY heatmapBinID ASC, recordCreated DESC;

END

3 个答案:

答案 0 :(得分:1)

尝试在MySQL中模拟ROW_NUMBER分区:http://www.sqlfiddle.com/#!2/fd8b5/4

鉴于此数据:

create table sentai(
  band varchar(50),
  member_name varchar(50),
  member_year int not null
);

insert into sentai(band, member_name, member_year) values
('BEATLES','JOHN',1960),
('BEATLES','PAUL',1961),
('BEATLES','GEORGE',1962),
('BEATLES','RINGO',1963),
('VOLTES V','STEVE',1970),
('VOLTES V','MARK',1971),
('VOLTES V','BIG BERT',1972),
('VOLTES V','LITTLE JOHN',1973),
('VOLTES V','JAMIE',1964),
('ERASERHEADS','ELY',1990),
('ERASERHEADS','RAYMUND',1991),
('ERASERHEADS','BUDDY',1992),
('ERASERHEADS','MARCUS',1993);

对象,找到每个乐队的所有三个最新成员。

首先,我们必须根据大多数年份(按降序排列)在每个成员上放置一个row_number

select *,

  @rn := @rn + 1 as rn
from (sentai s, (select @rn := 0) as vars)
order by s.band, s.member_year desc;

输出:

|        BAND | MEMBER_NAME | MEMBER_YEAR | @RN := 0 | RN |
|-------------|-------------|-------------|----------|----|
|     BEATLES |       RINGO |        1963 |        0 |  1 |
|     BEATLES |      GEORGE |        1962 |        0 |  2 |
|     BEATLES |        PAUL |        1961 |        0 |  3 |
|     BEATLES |        JOHN |        1960 |        0 |  4 |
| ERASERHEADS |      MARCUS |        1993 |        0 |  5 |
| ERASERHEADS |       BUDDY |        1992 |        0 |  6 |
| ERASERHEADS |     RAYMUND |        1991 |        0 |  7 |
| ERASERHEADS |         ELY |        1990 |        0 |  8 |
|    VOLTES V | LITTLE JOHN |        1973 |        0 |  9 |
|    VOLTES V |    BIG BERT |        1972 |        0 | 10 |
|    VOLTES V |        MARK |        1971 |        0 | 11 |
|    VOLTES V |       STEVE |        1970 |        0 | 12 |
|    VOLTES V |       JAMIE |        1964 |        0 | 13 |

然后我们在成员处于不同频段时重置行号:

select *,

  @rn := IF(@pg = s.band, @rn + 1, 1) as rn,
  @pg := s.band
from (sentai s, (select @pg := null, @rn := 0) as vars)
order by s.band, s.member_year desc;

输出:

|        BAND | MEMBER_NAME | MEMBER_YEAR | @PG := NULL | @RN := 0 | RN | @PG := S.BAND |
|-------------|-------------|-------------|-------------|----------|----|---------------|
|     BEATLES |       RINGO |        1963 |      (null) |        0 |  1 |       BEATLES |
|     BEATLES |      GEORGE |        1962 |      (null) |        0 |  2 |       BEATLES |
|     BEATLES |        PAUL |        1961 |      (null) |        0 |  3 |       BEATLES |
|     BEATLES |        JOHN |        1960 |      (null) |        0 |  4 |       BEATLES |
| ERASERHEADS |      MARCUS |        1993 |      (null) |        0 |  1 |   ERASERHEADS |
| ERASERHEADS |       BUDDY |        1992 |      (null) |        0 |  2 |   ERASERHEADS |
| ERASERHEADS |     RAYMUND |        1991 |      (null) |        0 |  3 |   ERASERHEADS |
| ERASERHEADS |         ELY |        1990 |      (null) |        0 |  4 |   ERASERHEADS |
|    VOLTES V | LITTLE JOHN |        1973 |      (null) |        0 |  1 |      VOLTES V |
|    VOLTES V |    BIG BERT |        1972 |      (null) |        0 |  2 |      VOLTES V |
|    VOLTES V |        MARK |        1971 |      (null) |        0 |  3 |      VOLTES V |
|    VOLTES V |       STEVE |        1970 |      (null) |        0 |  4 |      VOLTES V |
|    VOLTES V |       JAMIE |        1964 |      (null) |        0 |  5 |      VOLTES V |

然后我们只选择每个乐队中最近的三个成员:

select x.band, x.member_name, x.member_year
from
(
  select *,
    @rn := IF(@pg = s.band, @rn + 1, 1) as rn,
    @pg := s.band
  from (sentai s, (select @pg := null, @rn := 0) as vars)
  order by s.band, s.member_year desc
) as x
where x.rn <= 3
order by x.band, x.member_year desc;

输出:

|        BAND | MEMBER_NAME | MEMBER_YEAR |
|-------------|-------------|-------------|
|     BEATLES |       RINGO |        1963 |
|     BEATLES |      GEORGE |        1962 |
|     BEATLES |        PAUL |        1961 |
| ERASERHEADS |      MARCUS |        1993 |
| ERASERHEADS |       BUDDY |        1992 |
| ERASERHEADS |     RAYMUND |        1991 |
|    VOLTES V | LITTLE JOHN |        1973 |
|    VOLTES V |    BIG BERT |        1972 |
|    VOLTES V |        MARK |        1971 |

虽然在MySQL上尚未提供窗口函数(例如,ROW_NUMBER OVER PARTITION),但只需使用变量进行模拟。如果这比光标方法更快,请告诉我们


在支持窗口的RDBMS上看起来如何:http://www.sqlfiddle.com/#!1/fd8b5/6

with member_recentness as
(
  select row_number() over each_band as recent, *
  from sentai
  window each_band as (partition by band order by member_year desc)
)
select * 
from member_recentness
where recent <= 3;

输出:

| RECENT |        BAND | MEMBER_NAME | MEMBER_YEAR |
|--------|-------------|-------------|-------------|
|      1 |     BEATLES |       RINGO |        1963 |
|      2 |     BEATLES |      GEORGE |        1962 |
|      3 |     BEATLES |        PAUL |        1961 |
|      1 | ERASERHEADS |      MARCUS |        1993 |
|      2 | ERASERHEADS |       BUDDY |        1992 |
|      3 | ERASERHEADS |     RAYMUND |        1991 |
|      1 |    VOLTES V | LITTLE JOHN |        1973 |
|      2 |    VOLTES V |    BIG BERT |        1972 |
|      3 |    VOLTES V |        MARK |        1971 |

答案 1 :(得分:0)

SELECT * FROM MyTable WHERE binID IN (SELECT DISTINCT(bin_id) FROM mysql_table) ORDER BY createdDate DESC LIMIT 10;

这没有经过测试,也没用过语法。

添加索引以提高性能。

答案 2 :(得分:0)

如果你尝试在没有任何连接键的情况下连接2个表,它将是2个表的笛卡尔积,即:

SELECT * 
FROM MyTable t 
    INNER JOIN (SELECT DISTINCT binId FROM MyTable) AS u 
WHERE 
    t.binID = theBinID 
ORDER BY t.createdDate DESC LIMIT 10

您可以参考this