过滤MySQL中的异常值

时间:2014-03-06 20:13:07

标签: mysql standard-deviation outliers

我正在尝试过滤MySQL中的异常值。然而,在计算平均值时,异常值仍然存在。

例如,如果我收到6个订单,船舶费用分别为45,50,180,10,55和52,那么我预计180和10会从平均值下降......但它们不是。

这是我当前的查询:

SELECT
    GROUP_CONCAT(o.orderid),
    oi.productid,
    AVG(o.actual_cost - (oi.wholesale_cost * oi.amount)) AS avg_ship_cost,
    STDDEV(o.actual_cost - (oi.wholesale_cost * oi.amount)) AS std_dev
FROM
    orders AS o,
    order_items AS oi,
    products AS p /* Needed to filter any deleted from products table since order placed. */
LEFT JOIN (
    SELECT
        GROUP_CONCAT(o.orderid),
        oi.productid,
        o.invoice_total - (oi.wholesale_cost * oi.amount) AS order_ship_cost,
        AVG(o.invoice_total - (oi.wholesale_cost * oi.amount)) AS avg_ship_cost,
        STDDEV(o.invoice_total - (oi.wholesale_cost * oi.amount)) AS std_dev
    FROM
        orders AS o_lj,
        order_items AS oi_lj
    CROSS JOIN (
        SELECT
            AVG(o_a.invoice_total - (oi_a.wholesale_cost * oi_a.amount)) AS mean,
            STDDEV(o_a.invoice_total - (oi_a.wholesale_cost * oi_a.amount)) AS dev
        FROM
            orders AS o_a,
            order_items AS oi_a
        WHERE
            o_a.orderid = oi_a.orderid AND
            oi_a.productid = p_a.productid AND
            o_a.date > UNIX_TIMESTAMP(NOW() - INTERVAL 90 DAY) AND
            oi_a.amount = 1 AND
            o_a.invoice_total > 0
    ) a
    WHERE
        o_lj.orderid = oi_lj.orderid AND
        o_lj.date > UNIX_TIMESTAMP(NOW() - INTERVAL 90 DAY) AND
        oi_lj.amount = 1 AND
        o_lj.invoice_total > 0 AND
        ABS(o_lj.invoice_total - (oi_lj.wholesale_cost * oi_lj.amount) - a.mean) / a.dev > 1
    GROUP BY
        oi_lj.productid
) lj
ON lj.productid = oi.productid
WHERE
    o.orderid = oi.orderid AND
    p.productid = oi.productid AND
    o.date > UNIX_TIMESTAMP(NOW() - INTERVAL 90 DAY) AND
    oi.amount = 1 AND
    o.invoice_total > 0 AND
    lj.productid IS NULL
GROUP BY
    oi.productid

更改允许的平均偏差数并不能解决问题。

这是我的出发点: http://www.ryanbyrd.net/techramble/2012/01/18/mysql-strip-outliers-for-average-and-standard-deviation/

为什么他的查询有效,而我的查询没有?

0 个答案:

没有答案