MySQL:无法从特定分区中选择记录?

时间:2015-01-01 13:11:44

标签: mysql sql select partitioning find-in-set

我正在使用 MySQL 5.6 。我创建了一个包含366个分区的表,以便每天保存数据意味着在一年中我们最多有366天,因此我在该表上创建了366个分区。散列分区由整数列管理,每个记录存储1到366个。

Report_Summary 表:

CREATE TABLE `Report_Summary` (
  `PartitionsID` int(4) unsigned NOT NULL,
  `ReportTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `Amount` int(10) NOT NULL,
  UNIQUE KEY `UNIQUE` (`PartitionsID`,`ReportTime`),
  KEY `PartitionsID` (`PartitionsID`),
  KEY `ReportTime` (`ReportTime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
/*!50100 PARTITION BY HASH (PartitionsID)
PARTITIONS 366 */

我当前的查询:

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2014-12-30 23:59:59' AND 
      RS.PartitionsID BETWEEN DAYOFYEAR('2014-12-26 00:00:00') AND DAYOFYEAR('2014-12-30 23:59:59')
GROUP BY ReportDate; 

上述查询完美有效,并使用分区 p360 p364 来获取数据。现在问题是我将从日期传递到'2014-12-26'并将日期传递给'2015-01-01'然后上面的查询将无效。因为'2015-01-01'的那一天是 1 所以我的条件失败了。

现在我尝试在 IN 运算符中传递值,然后在查询下面的数据库检查中完美地运行:

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND 
      RS.PartitionsID IN (360,361,362,363,364,365,1)
GROUP BY ReportDate; 

为了生成上面的场景我创建了一个函数并传递了两个日期并生成了一个逗号分隔的ID字符串

SELECT GenerateRange('2014-12-26 00:00:00', '2015-01-01 23:59:59');

将数据重新发布为:

'360,361,362,363,364,365,366,1'

我尝试在查询中使用该功能,因此我更改了我的查询,如下所示:

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND 
      FIND_IN_SET(RS.PartitionsID, GenerateRange('2014-12-26 00:00:00', '2015-01-01 00:00:00'))
GROUP BY ReportDate; 

然后我使用 EXPLAIN PARTITION SELECT ... 检查了上述查询的执行计划。我发现我的病情不会起作用。它使用所有分区来获取数据。我想只使用那些日期的特定分区。 必须检查这些 360,361,362,363,364,365,366,1 分区是指 p360 p366 p1

为什么我的查询不起作用?这是实现这个的正确方法然后我想要解决方案我怎样才能实现这个目标?

我从编码中知道我可以实现这个但是我必须编写一个查询来实现它。

...谢谢

4 个答案:

答案 0 :(得分:1)

我可以想到几个选项。

  1. 创建涵盖多年搜索条件的case语句。
  2. 创建一个CalendarDays表,并使用它为DayOfYear子句获取in的不同列表。
  3. 选项1的变体,但使用union分别搜索每个范围
  4. 选项1:使用case语句。它不漂亮,但似乎工作。如果查询跨越非闰年的年份,则有一种方案可以搜索一个额外的分区366。另外,我不确定优化程序是否会喜欢OR过滤器中的RS.ParitionsID,但您可以尝试一下。

    SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
    FROM Report_Summary RS
    WHERE RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate
        AND 
        (
        RS.PartitionsID BETWEEN 
            CASE 
                WHEN
                    --more than one year, search all days 
                    year(@endDate) - year(@startDate) > 1
                    --one full year difference 
                    OR year(@endDate) - year(@startDate) = 1 
                        AND DAYOFYEAR(@startDate) <= DAYOFYEAR(@endDate)
                THEN 1
                ELSE DAYOFYEAR(@startDate)
            END
            and 
            CASE
                WHEN 
                    --query spans the end of a year
                    year(@endDate) - year(@startDate) >= 1
                THEN 366
                ELSE DAYOFYEAR(@endDate)
            END
        --Additional query to search less than portion of next year
        OR RS.PartitionsID <=
            CASE
                WHEN year(@endDate) - year(@startDate) > 1
                    OR DAYOFYEAR(@startDate) > DAYOFYEAR(@endDate)
                THEN DAYOFYEAR(@endDate)
                ELSE NULL
            END
        )
    GROUP BY ReportDate;
    

    选项2:使用CalendarDays表格。这个选项更清洁。缺点是,如果您没有表,则需要创建一个新的CalendarDays表。

    SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
    FROM Report_Summary RS
    WHERE RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate
        AND RS.PartitionsID IN
        (
            SELECT DISTINCT DAYOFYEAR(c.calDate) 
            FROM dbo.calendarDays c
            WHERE c.calDate >= @startDate and c.calDate <= @endDate
        )
    

    编辑:选项3:选项1的变体,但使用Union All分别搜索每个范围。这里的想法是,由于语句中没有OR,优化器将能够应用分区修剪。注意:我通常不会在MySQL中工作,所以我的语法可能有些偏差,但一般的想法就在那里。

    DECLARE @startDate datetime, @endDate datetime;
    DECLARE @rangeOneStart datetime, @rangeOneEnd datetime, @rangeTwoStart datetime, @rangeTwoEnd datetime;
    
    SELECT @rangeOneStart := 
            CASE 
                WHEN
                    --more than one year, search all days 
                    year(@endDate) - year(@startDate) > 1
                    --one full year difference 
                    OR year(@endDate) - year(@startDate) = 1 
                        AND DAYOFYEAR(@startDate) <= DAYOFYEAR(@endDate)
                THEN 1
                ELSE DAYOFYEAR(@startDate)
            END
        , @rangeOneEnd := 
            CASE
                WHEN 
                    --query spans the end of a year
                    year(@endDate) - year(@startDate) >= 1
                THEN 366
                ELSE DAYOFYEAR(@endDate)
            END 
        , @rangeTwoStart := 1
        , @rangeTwoEnd := 
            CASE
                WHEN year(@endDate) - year(@startDate) > 1
                    OR DAYOFYEAR(@startDate) > DAYOFYEAR(@endDate)
                THEN DAYOFYEAR(@endDate)
                ELSE NULL
            END
    ;
    
    SELECT t.ReportDate, sum(t.Amount) as Total
    FROM 
    (
        SELECT DATE(RS.ReportTime) AS ReportDate, RS.Amount
        FROM Report_Summary RS
        WHERE RS.PartitionsID BETWEEN @rangeOneStart AND @rangeOneEnd
            AND RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate
    
        UNION ALL
    
        SELECT DATE(RS.ReportTime) AS ReportDate, RS.Amount
        FROM Report_Summary RS
        WHERE RS.PartitionsID BETWEEN @rangeTwoStart AND @rangeTwoEnd
            AND @rangeTwoEnd IS NOT NULL
            AND RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate
    ) t
    GROUP BY ReportDate;
    

答案 1 :(得分:0)

要开始解决此问题,您需要一个子查询,在给定日期范围的情况下,返回包含该范围内所有DAYOFYEAR()值的结果集。

让我们解决这个问题。对于初学者,我们需要一个查询,它可以返回从0到至少366的所有整数的序列。这是查询。它返回一列seq值0-624。

SELECT A.N + 5*(B.N + 5*(C.N + 5*(D.N))) AS seq
  FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 
                      UNION SELECT 3 UNION SELECT 4) AS A
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                      UNION SELECT 3 UNION SELECT 4) AS B
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                      UNION SELECT 3 UNION SELECT 4) AS C
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                      UNION SELECT 3 UNION SELECT 4) AS D

(这是简单的交叉连接技巧,可以生成5 ** 4个数字的所有组合。)

接下来,我们需要使用它来生成DAYOFYEAR()值列表。为了示例,我们使用您的开始和结束日期。此查询生成一个结果集,其中包含一组整数,显示该日期范围内一年中的日期。

SELECT DISTINCT DAYOFYEAR(first_day + INTERVAL seq DAY) doy
  FROM (SELECT DATE('2014-12-26 00:00:00') AS first_day,
               DATE('2015-01-01 23:59:59') AS last_day
       ) params
  JOIN (
         SELECT A.N + 5*(B.N + 5*(C.N + 5*(D.N))) AS seq
           FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 
                               UNION SELECT 3 UNION SELECT 4) AS A
           JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                               UNION SELECT 3 UNION SELECT 4) AS B
           JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                               UNION SELECT 3 UNION SELECT 4) AS C
           JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                               UNION SELECT 3 UNION SELECT 4) AS D
       ) seq ON seq.seq <= TIMESTAMPDIFF(DAY,first_day,last_day)
 ORDER BY 1

我认为你可以说服自己这个粗略的小问题可以在大约一年半(625天)或更短的任何合理范围内正常工作。如果你使用更长的时间跨度,你可能会陷入闰年。

最后,您可以在PartitionsID IN ()子句中使用此查询。这看起来像这样。

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
  FROM Report_Summary RS
 WHERE RS.ReportTime >= '2014-12-26 00:00:00'
   AND RS.ReportTime <= '2015-01-01 23:59:59'
   AND RS.PartitionsID 
     IN (
         SELECT DISTINCT DAYOFYEAR(first_day + INTERVAL seq DAY) doy
           FROM (SELECT DATE('2014-12-26 00:00:00') AS first_day,
                        DATE('2015-01-01 23:59:59') AS last_day
                ) params
           JOIN (
                  SELECT A.N + 5*(B.N + 5*(C.N + 5*(D.N))) AS seq
                    FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 
                                        UNION SELECT 3 UNION SELECT 4) AS A
                    JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                                        UNION SELECT 3 UNION SELECT 4) AS B
                    JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                                        UNION SELECT 3 UNION SELECT 4) AS C
                    JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2
                                        UNION SELECT 3 UNION SELECT 4) AS D
                ) seq ON seq.seq <= TIMESTAMPDIFF(DAY,first_day,last_day)
          ORDER BY 1
         ) 
GROUP BY ReportDate; 

那应该为你做。

如果您使用的是MariaDB 10+,则built in sequence tables的名称类似于seq_0_to_624

这里有关于这个主题的文章:

http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/

答案 2 :(得分:0)

我得到了解决方案,我改变了在表格中存储 PartitionsId 列的逻辑。最初我在 PartitionsId 列中存储 DayOfYear(reportTime)列。现在我通过存储 TO_DAYS(reportTime)并存储到 PartitionsId 列中来改变了这种逻辑。

所以我的表格结构如下:

CREATE TABLE `Report_Summary` (
  `PartitionsID` int(10) unsigned NOT NULL,
  `ReportTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `Amount` int(10) NOT NULL,
  UNIQUE KEY `UNIQUE` (`PartitionsID`,`ReportTime`),
  KEY `PartitionsID` (`PartitionsID`),
  KEY `ReportTime` (`ReportTime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
/*!50100 PARTITION BY HASH (PartitionsID)
PARTITIONS 366 */

INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735928','2014-12-26 11:46:12','100');
INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735929','2014-12-27 11:46:23','50');
INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735930','2014-12-28 11:46:37','44');
INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735931','2014-12-29 11:46:49','15');
INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735932','2014-12-30 11:46:59','56');
INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735933','2014-12-31 11:47:22','68');
INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735934','2015-01-01 11:47:35','76');
INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735935','2015-01-02 11:47:43','88');
INSERT INTO `Report_Summary` (`PartitionsID`, `ReportTime`, `Amount`) VALUES('735936','2015-01-03 11:47:59','77');

检查SQL FIDDLE DEMO

我的查询是:

EXPLAIN PARTITIONS 
SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND 
      RS.PartitionsID BETWEEN TO_DAYS('2014-12-26 00:00:00') AND TO_DAYS('2015-01-01 23:59:59')
GROUP BY ReportDate; 

以上查询扫描我需要的特定分区,它也使用正确的索引。因此,在更改了 PartitionsId 列的逻辑后,我找到了正确的解决方案。

感谢所有回复,非常感谢大家的时间......

答案 3 :(得分:0)

根据您的SELECT,您真正需要的是一种称为&#34;摘要表&#34;的数据仓库技术。通过这种方式,您可以每天(或小时或其他)汇总数据,并将小计存储在一个小得多的表中。那么&#34;报告&#34;查看该表并总计小计。这通常比原始数据的强力扫描快10倍。更多详情:http://mysql.rjweb.org/doc.php/datawarehouse

这样就无需在原始数据(&#34;事实表&#34;)或汇总表中进行PARTITION。

但是,如果您需要清除旧数据,那么PARTITIONing可以派上用场,因为DROP PARTITION。为此你将使用BY RANGE(TO_DAYS(...)),而不是BY HASH。