MySQL忽略具有标准偏差的异常值

时间:2017-09-05 16:06:15

标签: mysql sql outliers

我需要构建一个可以计算平均值和计数的查询,同时忽略标准差的异常值。

我在Mysql(P和A)中有两个具有以下属性的表:

P =付款:

Value_gbp
Paymentid
Account 
rfx_ref

A =帐户:

Accountid
Entity_type
Settlment_model
rfx_ref

到目前为止,我已经得到了这个:

SELECT 
Account, 
COUNT(value_GBP) AS '# Of Payments', 
TRUNCATE(AVG(value_GBP),2) As 'Avg Value'
FROM payments, 

LEFT JOIN( 
SELECT STDDEV(value_gbp) as std_gbp
FROM payments, accounts 
WHERE payments.paymentid = accounts.acountid
AND Entity_type = 'company'
AND settlement_model = 'payment agent'
GROUP BY account
) outlier 

On payments.paymentid = accounts.acountid
WHERE payments.value_gbp<=outlier.std_gbp*2
AND Entity_type = 'company'
AND settlement_model = 'payment agent'
GROUP BY account

但它正在说明:

On payments.paymentid = accounts.acountid

任何人都可以帮助我吗?

1 个答案:

答案 0 :(得分:0)

子查询需要选择accounts.accountid,然后您需要在JOIN条件下使用它。

我也认为你对异常值的定义是错误的。它不应超过2个标准偏差,应该是平均值超过2个标准偏差的东西。因此子查询需要返回平均值和标准差,然后比较距离。

SELECT 
    account, 
    COUNT(value_GBP) AS '# Of Payments', 
    TRUNCATE(AVG(value_GBP),2) As 'Avg Value'
FROM payments 
JOIN( 
    SELECT accountid, AVG(value_gpb) AS avg_gbp, STDDEV(value_gbp) as std_gbp
    FROM payments, accounts 
    WHERE payments.paymentid = accounts.acountid
    AND Entity_type = 'company'
    AND settlement_model = 'payment agent'
    GROUP BY accountid
) outlier 
On payments.paymentid = outlier.accountid
JOIN accounts ON payments.paymentid = accounts.accountid
WHERE ABS(payments.value_gbp - outlier.avg_gpb) <= outlier.std_gbp*2
AND Entity_type = 'company'
AND settlement_model = 'payment agent'
GROUP BY account