2个表上的INNER JOIN返回错误的值

时间:2016-05-29 20:52:38

标签: mysql sql join

这是我的SQL查询:

SELECT  SUM(amz_event_shipment_items.quantity),
        amz_event_shipment_items.seller_sku

FROM    amz_event_shipment_items

INNER   JOIN amz_event_fees         ON amz_event_shipment_items.id = amz_event_fees.shipment_item_id
INNER   JOIN amz_shipment_events    ON amz_shipment_events.id = amz_event_shipment_items.shipment_event_id

WHERE   amz_event_fees.currency  = 'USD'
        AND amz_shipment_events.event_type <> 'RefundEvent'
        AND amz_shipment_events.posted_date BETWEEN '2016-5-1 07:00:00' AND '2016-5-7 06:59:59'

GROUP   BY amz_event_shipment_items.seller_sku


但是返回的值太高了......对我来说没有意义......

我错过了什么吗?

修改

Many shipment_events for each date

Each shipment_event HAS MANY shipment_item / BELONGS TO ONE event

Each shipment_item HAS MANY shipment_fee  / BELONGS TO ONE item

3 个答案:

答案 0 :(得分:1)

您将数量乘以费用数量。在寻找存在时使用EXISTSselect sum(i.quantity), i.seller_sku from amz_event_shipment_items i where exists ( select * from amz_event_fees f where f.currency = 'USD' and f.shipment_item_id = i.id ) and exists ( select * from amz_shipment_events e where e.event_type <> 'RefundEvent' and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59' and e.id = i.shipment_event_id ) group by i.seller_sku; 子句。

IN

(有时候MySQL会因EXISTS条款而变慢,所以我在这里使用IN,尽管我更喜欢/content/。)

答案 1 :(得分:1)

这不是一个答案,而是一个附件。如果我理解正确,你的查询返回了错误的结果但速度相当快,而我的(带有EXISTS子句)会返回正确的结果,但速度非常慢。

所以似乎消除重复的任务花费了太多时间。

这里有两个想法:

第一个想法:立即消除重复

我们在加入之前汇总费用而不是加入费用:

select 
  sum(i.quantity), 
  i.seller_sku
from amz_event_shipment_items i
join -- join with only one record per ID to substitute an EXISTS clause
(
  select distinct shipment_item_id
  from amz_event_fees
  where f.currency  = 'USD'
) f on f.shipment_item_id = i.id
and exists
(
  select *
  from amz_shipment_events e
  where e.event_type <> 'RefundEvent'
  and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59'
  and e.id = i.shipment_event_id
)
group by i.seller_sku;

第二个想法:预聚合值

这里我们尝试尽快聚合,以便保持中间结果小,而不必查找每个项目记录的事件表。

select 
  sum(i.pre_sum_quantity), 
  i.seller_sku
from 
(
  select seller_sku, shipment_event_id, sum(quantity) as pre_sum_quantity
  from amz_event_shipment_items
  where exists
  (
    select *
    from amz_event_fees f
    where f.currency  = 'USD'
    and f.shipment_item_id = amz_event_shipment_items.id
  )
  group by seller_sku, shipment_event_id
) i
where exists
(
  select *
  from amz_shipment_events e
  where e.event_type <> 'RefundEvent'
  and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59'
  and e.id = i.shipment_event_id
)
group by i.seller_sku;

如果只有很少的事件类型,您也可以尝试摆脱<>,从而使得更有可能使用索引:

where e.event_type in ('EarlyPaymentEvent','LatePaymentEvent')

(在这种情况下,event_type之前有posted_date的索引可能会付费。

我必须承认,我认为这不会比原来的EXISTS查询快得多,但是值得一试。

答案 2 :(得分:0)

您的某个联接可能返回的记录超出预期。我会尝试做一个select *并按sku排序并注视结果。