如何在使用group by时计算内部联接字段的中位数?

时间:2016-11-16 11:41:47

标签: mysql sql

我有以下查询,其中我检索特定商品的销售数量和每天销售的平均价格。

SELECT COUNT(1) AS num_sales, DATE_FORMAT(sales.created_at, '%Y-%m-%d') AS date, AVG(prices.price) AS avg_price
FROM sales INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7503 AND (`prices`.`source` = 0 or (`prices`.`price` >= 400 and `prices`.`source` > 0))
GROUP BY date
ORDER BY date ASC

我还有一个for循环,每天都会单独查询以获得中位数价格(让我们假设结果的数量是偶数):

SELECT prices.price FROM sales INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7503 
AND (`prices`.`source` = 0 or (`prices`.`price` >= 400 and `prices`.`source` > 0))
AND DATE(sales.created_at) = "<THE DATE OF THE CURRENT FOR-LOOP OBJECT>"
ORDER BY prices.price ASC
LIMIT 1 OFFSET <NUMBER OF THE MIDDLE ROW>

可以想象,这非常慢,因为在某些情况下,必须在大型表上进行数百次查询(销售表有几亿行)。

如何重写第一个SQL查询,以便它还计算prices.price的中位数,类似于AVG(prices.price)?我已经查看了诸如this one之类的答案,但无法解决如何针对我的具体情况进行调整。

我花了好几个小时试图完成这个,但我的SQL知识根本不够好。任何帮助将不胜感激!

root@ns525077:~# mysql -V
mysql  Ver 14.14 Distrib 5.7.13, for Linux (x86_64) using  EditLine wrapper

表模式:

CREATE TABLE `prices` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `item_id` int(11) unsigned NOT NULL,
 `price` decimal(8,2) NOT NULL,
 `net_price` decimal(8,2) NOT NULL,
 `source` tinyint(4) NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 PRIMARY KEY (`id`),
 UNIQUE KEY `id` (`id`),
 KEY `prices_ibfk_1` (`item_id`),
 CONSTRAINT `prices_ibfk_1` FOREIGN KEY (`item_id`) REFERENCES `items` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=4861375 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `sales` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `price_id` int(11) unsigned DEFAULT NULL,
 `item_key` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 PRIMARY KEY (`id`),
 UNIQUE KEY `id` (`id`),
 UNIQUE KEY `item_key` (`item_key`),
 KEY `price_id` (`price_id`),
 KEY `created_at` (`created_at`),
 KEY `price_id__created_at__IX` (`price_id`,`created_at`),
 CONSTRAINT `sales_ibfk_1` FOREIGN KEY (`price_id`) REFERENCES `prices` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=386156944 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

我的第一个查询的输出示例:

Example of output from my first query

1 个答案:

答案 0 :(得分:0)

经过广泛搜索后,我找到了问题here的答案。也许我最初没有说出我的问题。

我已经根据自己的情况调整了解决方案,这里是工作查询:

SELECT COUNT(1) AS num_sales,
       DATE_FORMAT(sales.created_at, '%Y-%m-%d') AS date,
       AVG(prices.price) AS avg_price,
       CASE(COUNT(1) % 2)
       WHEN 1 THEN SUBSTRING_INDEX(
           SUBSTRING_INDEX(
               group_concat(prices.price
                            ORDER BY prices.price SEPARATOR ',')
               , ',', (count(*) + 1) / 2)
           , ',', -1)
       ELSE (SUBSTRING_INDEX(
                 SUBSTRING_INDEX(
                     group_concat(prices.price
                                  ORDER BY prices.price SEPARATOR ',')
                     , ',', count(*) / 2)
                 , ',', -1)
             + SUBSTRING_INDEX(
                 SUBSTRING_INDEX(
                     group_concat(prices.price
                                  ORDER BY prices.price SEPARATOR ',')
                     , ',', (count(*) + 1) / 2)
                 , ',', -1)) / 2
       END median_price
FROM sales
  INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7381
      AND (`prices`.`source` = 0
           OR (`prices`.`price` >= 400
               AND `prices`.`source` > 0))
GROUP BY date
ORDER BY date ASC;