背景:
我试图进行一系列市场交易,并确定每种物品类型实际移动的金额。这几乎是我在MySql上的第一次尝试,所以查询很难看,但以下几乎可以工作:
SELECT types.typename,
averages.type,
averages.price,
movement.sold,
( averages.price * movement.sold ) AS value
FROM (SELECT type,
Round(Avg(price)) AS price
FROM orders
GROUP BY type) AS averages
INNER JOIN (SELECT type,
( startingvolume - currentvolume ) AS sold
FROM (SELECT type,
Sum(volume) AS currentVolume,
Sum(volumeentered) startingVolume
FROM orders
GROUP BY type) AS movement
WHERE ( startingvolume - currentvolume ) > 10000
ORDER BY sold) AS movement
ON averages.type = movement.type
INNER JOIN invtypes AS types
ON types.typeid = averages.type
ORDER BY value DESC
LIMIT 10 ;
-
+------------------------------------+-------+---------+------------+------------------+
| typeName | type | price | sold | value |
+------------------------------------+-------+---------+------------+------------------+
| Dirt | 34 | 1904767 | 2670581874 | 5086836224393358 |
| Light Wood | 2629 | 42999 | 2756595 | 118530828405 |
| Dark Wood | 24509 | 47344 | 1107771 | 52446310224 |
| Stone | 21922 | 18386 | 1505884 | 27687183224 |
| Grass | 238 | 5643 | 4554470 | 25700874210 |
| Paper | 3814 | 25635 | 861006 | 22071888810 |
| Iron | 3699 | 320270 | 58833 | 18842444910 |
| Ink | 16275 | 8552 | 2200545 | 18819060840 |
| Loam | 2679 | 5759 | 2608771 | 15023912189 |
| Copper | 672 | 904612 | 14989 | 13559229268 |
+------------------------------------+-------+---------+------------+------------------+
上述数据存在的问题是原始市场数据不可避免地受到异常值的破坏,如下所示:
select type, price from orders where type = 34 order by price desc limit 10;
-
+------+-----------+
| type | price |
+------+-----------+
| 34 | 200000000 |
| 34 | 15.99 |
| 34 | 12.06 |
| 34 | 10 |
| 34 | 7.67 |
| 34 | 7.5 |
| 34 | 7.3 |
| 34 | 7.17 |
| 34 | 7.1 |
| 34 | 7.06 |
+------+-----------+
核心问题:
99%的市场数据是干净的,但异常值会破坏平均值,而MySql似乎没有中位数功能。我已经找到了几个如何找到整个列的中位数的例子,但我需要每个项目的中位数。
如何在运行主查询之前确定每个项目的中位数而不是每个项目的平均值,还是有效地清理这些异常值的数据?
注意: 我尝试通过std省略结果,但物品价格从17美元到10亿美元不等,而偏差仍然相对较低,无论价格范围如何。
答案 0 :(得分:0)
我不会触摸您的原始查询,因为它非常复杂,但您可以做的一个选项是使用子查询删除任何统计异常值。例如,如果您想从orders
表中删除任何异常值,这些异常值的值超过您可以使用的平均值的两个标准偏差:
SELECT t1.type,
t1.price
FROM orders t1
INNER JOIN
(
SELECT type,
AVG(price) AS AVG,
STD(price) AS STD
FROM orders
GROUP BY type
) t2
ON t1.type = t2.type
WHERE t1.price < ABS(2*t2.STD - t2.AVG) -- any value more than 2 standard devations
-- away from the mean is discarded
在这里演示: