我有一个约70K条目的订单表,如下:
我希望,在每个客户的基础上,确定最常见的订单是什么,以及订单的确定性(样本大小和概率)。
这是我到目前为止所做的:
CREATE VIEW CustomerOrderProbabaility as
SELECT Distinct(customerID)
customerID,
order,
COUNT(*) as sampleSize
FROM (Select customerID, order1 AS order FROM orderTable UNION
Select customerID, order2 AS order FROM orderTable UNION
Select customerID, order3 AS order FROM orderTable
)
GROUP BY customerID, order
ORDER BY customerID, COUNT(*) DESC;
我收到了customerId
和order
的表格,但sampleSize
始终为1
。我哪里错了?
答案 0 :(得分:1)
我认为你想要UNION ALL
以及其他一些变化:
CREATE VIEW CustomerOrderProbabaility as
SELECT DISTINCT ON (customerID)
customerID,
order,
COUNT(*) as sampleSize,
SUM(COUNT(*)) OVER (PARTITION BY customerId) as totOrders
FROM (Select customerID, order1 AS theorder FROM orderTable UNION ALL
Select customerID, order2 AS theorder FROM orderTable UNION ALL
Select customerID, order3 AS theorder FROM orderTable
) co
GROUP BY customerID, theorder
ORDER BY customerID, COUNT(*) DESC;
UNION
删除重复项。
的变化:
order
重命名为theorder
。 order
是关键字。即使被接受为列名,我也不认为这是个好主意。UNION ALL
代替UNION
,因此不会删除重复项。DISTINCT ON
代替DISTINCT
,因为这是您的意图。TotOrders
以计算每位客户的所有订单。