Hive:查找客户从交易中一起购买的商品

时间:2016-09-22 14:02:20

标签: mysql hive

我正在研究mysql,但需要在hive上复制一些查询。

我有一张表格

transaction table

我想检索以下信息:

Resultant table

在mysql上,以下查询有效:

SELECT c.original_item_id, c.bought_with_item_id, count(*) as times_bought_together
FROM (
  SELECT a.item_id as original_item_id, b.item_id as bought_with_item_id
  FROM items a
  INNER join items b
  ON a.transaction_id = b.transaction_id AND a.item_id != b.item_id where original_item_id in ('B','C')) c
GROUP BY c.original_item_id, c.bought_with_item_id;

但是我无法将其转换为hive查询,我已经尝试了很多改组连接并替换了条件但没有获得必要结果的地方。如果我能在这个

上找到一些帮助,那就太好了

1 个答案:

答案 0 :(得分:0)

Hive不支持不平等连接。但您可以将此条件a.item_id != b.item_id移至where子句:

create table items(transaction_id smallint, item_id string);

insert overwrite table items 
select 1 , 'A'  from default.dual union all
select 1 , 'B'  from default.dual union all
select 1 , 'C'  from default.dual union all
select 2 , 'B'  from default.dual union all
select 2 , 'A'  from default.dual union all
select 3 , 'A'  from default.dual union all
select 4 , 'B'  from default.dual union all
select 4 , 'C'  from default.dual;

SELECT c.original_item_id, c.bought_with_item_id, count(*) as times_bought_together
FROM (
      SELECT a.item_id as original_item_id, b.item_id as bought_with_item_id
      FROM items a
      INNER join items b ON a.transaction_id = b.transaction_id 
      WHERE 
            a.item_id in ('B','C') --original_item_id
        and a.item_id != b.item_id   
     ) c
GROUP BY c.original_item_id, c.bought_with_item_id;
---
OK
original_item_id        bought_with_item_id     times_bought_together
B       A       2
B       C       2
C       A       1
C       B       2

所用时间:24.164秒,提取:4行