我正在研究mysql,但需要在hive上复制一些查询。
我有一张表格
我想检索以下信息:
在mysql上,以下查询有效:
SELECT c.original_item_id, c.bought_with_item_id, count(*) as times_bought_together
FROM (
SELECT a.item_id as original_item_id, b.item_id as bought_with_item_id
FROM items a
INNER join items b
ON a.transaction_id = b.transaction_id AND a.item_id != b.item_id where original_item_id in ('B','C')) c
GROUP BY c.original_item_id, c.bought_with_item_id;
但是我无法将其转换为hive查询,我已经尝试了很多改组连接并替换了条件但没有获得必要结果的地方。如果我能在这个
上找到一些帮助,那就太好了答案 0 :(得分:0)
Hive不支持不平等连接。但您可以将此条件a.item_id != b.item_id
移至where
子句:
create table items(transaction_id smallint, item_id string);
insert overwrite table items
select 1 , 'A' from default.dual union all
select 1 , 'B' from default.dual union all
select 1 , 'C' from default.dual union all
select 2 , 'B' from default.dual union all
select 2 , 'A' from default.dual union all
select 3 , 'A' from default.dual union all
select 4 , 'B' from default.dual union all
select 4 , 'C' from default.dual;
SELECT c.original_item_id, c.bought_with_item_id, count(*) as times_bought_together
FROM (
SELECT a.item_id as original_item_id, b.item_id as bought_with_item_id
FROM items a
INNER join items b ON a.transaction_id = b.transaction_id
WHERE
a.item_id in ('B','C') --original_item_id
and a.item_id != b.item_id
) c
GROUP BY c.original_item_id, c.bought_with_item_id;
---
OK
original_item_id bought_with_item_id times_bought_together
B A 2
B C 2
C A 1
C B 2
所用时间:24.164秒,提取:4行