从多个表中检索数据 - SQL

时间:2017-03-03 12:25:12

标签: sql join hive

我有以下表格:

表搜索:

Date        Product    Search_ID
2017-01-01    Nike            101
2017-01-01    Reebok          292
2017-01-01    Nike            103
2017-01-01    Adidas          385
2017-01-02    Nike            284

表购买

Date        Product    Total_sale
2017-01-01    Adidas        4
2017-01-01    Nike          1
2017-01-01    Adidas        2
2017-01-02    Nike          3

每个产品可在同一天内拥有多行。当天产品的总购买数量=总和(total_sale)

我需要找到每件产品每天的购买率,即购买次数/搜索次数。

供参考,对于2017-01-01上的耐克,搜索总数为702,而总购买次数为47,购买率为4 7/702 = 0.0669

我试过了:

select t1.product, sum(t1.Total_sale), count(t2.Search_ID)
from db.purchases t1 join db.searches
on t1.date = t2.date and t1.product = t2.product
where t1.date = '2017-01-01' and t1.product = 'Nike'
group by t1.product, t1.date
;

这给了我一个奇怪的结果:

 product  |  sum  | count 
----------+-------+-------
   Nike   | 32994 | 32994

......我在这里做错了什么?

3 个答案:

答案 0 :(得分:2)

联接已经将结果集相乘,当您删除GROUP BY并使用*而不是指定的字段时,您将看到它。

select * from db.purchases t1 join db.searches
on t1.date = t2.date and t1.product = t2.product
where t1.date = '2017-01-01' and t1.product = 'Nike'

您无需加入表格来计算购买率:

SELECT     
(select sum(t1.Total_sale) from db.purchases t1 where t1.date = '2017-01-01' and t1.product = 'Nike')
/
(select count(t2.Search_ID) from db.searches t2 where t2.date = '2017-01-01' and t2.product = 'Nike')

答案 1 :(得分:1)

在加入之前进行聚合

select p.product, p.sales, s.searches
from (select p.date, p.product, sum(p.Total_sale) as sales
      from db.purchases p
      group by p.date, p.product
     ) p join
     (select s.date, s.product, count(*) as searches
      from db.searches s
      group by s.date, s.product
     ) s
     on p.date = s.date and p.product = s.product
where p.date = '2017-01-01' and p.product = 'Nike';

注意:您可以将where移动到子查询中,以提高性能。这很容易推广到更多的日子和产品。

答案 2 :(得分:1)

问题是您要加入两个未聚合的表,因此每个“购买”行都会与每个“搜索”行连接在一起。因此您的结果是32994,来自702 x 49。

通过连接获得所需结果的正确方法是

select  t1.product, t1.total_sales, t2.search_count
from    (
          select date, product, sum(total_sales) as total_sales
          from   db.purchases
          group by date, product
        ) t1
join    (
          select  date, product, count(search_id) as search_count
          from    db.searches
          group by date, product
        ) t2
on      t1.date = t2.date and t1.product = t2.product
where   t1.date = '2017-01-01' and t1.product = 'Nike'
group by t1.product, t1.date;