3列之间的统计模式

时间:2016-01-19 17:21:07

标签: sql postgresql statistics

我有一个约70K条目的订单表,如下:

Screenshot of table.

我希望,在每个客户的基础上,确定最常见的订单是什么,以及订单的确定性(样本大小和概率)。

这是我到目前为止所做的:

CREATE VIEW CustomerOrderProbabaility as 
SELECT Distinct(customerID)
        customerID,
        order,
        COUNT(*) as sampleSize
FROM (Select customerID, order1 AS order FROM orderTable UNION
      Select customerID, order2 AS order FROM orderTable UNION
      Select customerID, order3 AS order FROM orderTable
     )
GROUP BY customerID, order
ORDER BY customerID, COUNT(*) DESC;

我收到了customerIdorder的表格,但sampleSize始终为1。我哪里错了?

1 个答案:

答案 0 :(得分:1)

我认为你想要UNION ALL以及其他一些变化:

CREATE VIEW CustomerOrderProbabaility as 
    SELECT DISTINCT ON (customerID)
            customerID,
            order,
            COUNT(*) as sampleSize,
            SUM(COUNT(*)) OVER (PARTITION BY customerId) as totOrders
    FROM (Select customerID, order1 AS theorder FROM orderTable UNION ALL
          Select customerID, order2 AS theorder FROM orderTable UNION ALL
          Select customerID, order3 AS theorder FROM orderTable
         ) co
    GROUP BY customerID, theorder
    ORDER BY customerID, COUNT(*) DESC;

UNION删除重复项。

的变化:

  • order重命名为theorderorder是关键字。即使被接受为列名,我也不认为这是个好主意。
  • UNION ALL代替UNION,因此不会删除重复项。
  • DISTINCT ON代替DISTINCT,因为这是您的意图。
  • 添加TotOrders以计算每位客户的所有订单。