SQL查询唯一金额的总和,删除重复项

时间:2015-05-17 18:23:03

标签: mysql sql select distinct

考虑以下MySQL表架构:

id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime

以上架构用于存储餐馆的POS收据。获取每日收据和收据的报告他们的总和。试过以下查询:

SELECT location_id,count(distinct(transaction_no)) as count,sum(amount) as receipt_amount FROM `receipts`  WHERE date(`receipts`.`created_at`) = '2015-05-17' GROUP BY `receipts`.`location_id`

但问题是,每次金额可能/可能不同时,具有相同交易编号的收据会重​​复多次。处理此问题的业务规则是我们收到的最后一张收据是最新收据。所以上面的查询不起作用。

我要做的是以下内容:

  1. 对于每个位置,获取当天的所有收据。
  2. 如果交易号不重复,请根据created_at
  3. 获取最后收到的收据
  4. 总量金额总和。
  5. [编辑]

    这是查询计划:

    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            table: <derived2>
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 25814155
         filtered: 100.00
            Extra: Using where; Using temporary; Using filesort
    *************************** 2. row ***************************
               id: 1
      select_type: PRIMARY
            table: r
             type: ref
    possible_keys: punchh_key_location_id_created_at
              key: punchh_key_location_id_created_at
          key_len: 50
              ref: t.punchh_key
             rows: 1
         filtered: 100.00
            Extra: Using index condition; Using where
    *************************** 3. row ***************************
               id: 2
      select_type: DERIVED
            table: r
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 25814155
         filtered: 100.00
            Extra: Using temporary; Using filesort
    3 rows in set, 1 warning (0.00 sec)
    

3 个答案:

答案 0 :(得分:1)

您也可以使用distinct中修改的sum

SELECT   location_id,
         COUNT(DISTINCT transaction_no) AS cnt,
         SUM(DISTINCT amount) AS receipt_amount 
FROM     `receipts`  
WHERE    DATE(`receipts`.`created_at`) = '2015-05-17' 
GROUP BY `receipts`.`location_id`

答案 1 :(得分:1)

您可以通过加入确定当天每个created_at的最后created_at的内联视图,在同一天内仅计算最后transaction_no值的金额。

这可以避免简单地使用sum(distinct ...,因为否则两个具有相同金额的不同交易(如果存在)将只计算一次。

这种方法应该避免这个问题。

select      r.location_id,
            count(*) as num_transactions,
            sum(r.amount) as receipt_amount
from        receipts r
       join (
                select      transaction_no,
                            max(created_at) as last_created_at_for_trans
                from        receipts
                where       created_at like '2015-05-17%'
                group by    transaction_no
            ) v
         on r.transaction_no = v.transaction_no
        and r.created_at = v.last_created_at_for_trans
where       r.created_at like '2015-05-17%'
group by    r.location_id

另一种方法是使用not exists,您可能希望测试哪个提供更好的性能:

select      r.location_id,
            count(*) as num_transactions,
            sum(r.amount) as receipt_amount
from        receipts r
where       r.created_at like '2015-05-17%'
        and not exists ( select 1
                         from   receipts x
                         where  x.transaction_no = r.transaction_no
                            and x.created_at > r.created_at
                       )
group by    r.location_id

答案 2 :(得分:1)

如何计算在多个天重复的交易?

我认为你实际上想要计算交易,只是因为它是当天的最后一个,如果第二天有另一张收据。您可以通过多种方式获取每笔交易的最终记录。一种典型的方法是使用group by(这类似于Brian的查询,但略有不同):

select r.*
from receipts r join
     (select transaction_no, max(created_at) as maxca
      from receipts r
      group by transaction_no
     ) t
     on r.transaction_no = t.transaction_no and r.created_at = t.maxca;

完整的查询是:

select location_id, count(*) as numtransactions, sum(amount) as receipt_amount
from receipts r join
     (select transaction_no, max(created_at) as maxca
      from receipts r
      group by transaction_no
     ) t
     on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
where r.created_at >= date('2015-05-17') and r.created_at < date('2015-05-18')
group by location_id;

注意日期比较。

date(r.created_at) = '2015-05-17'的原始形式在逻辑上是正确的。但是,date()的使用意味着不能使用索引。与常量进行两次比较的表单将允许查询利用receipts(created_at)上的索引。

不鼓励使用like日期。这需要将日期隐式地转换为字符串,然后将其作为字符串进行比较。这有不必要的转换,在某些数据库中,语义依赖于全球化设置。