我在蜂巢中有两个表,如下所述在Hive中
表1:
<script src="/socket.io/socket.io.js"></script>
<div class="col">
<h1>Sn00p Monitor Dashboard</h1>
<ul id="messages"></ul>
</div>
表2:
id name value
1 abc stack
3 abc overflow
4 abc foo
6 abc bar
我需要计算值列而不考虑id和name列。
预期输出为
id name value
5 xyz overflow
9 xyz stackoverflow
3 xyz foo
23 xyz bar
我试过了,可以在其他数据库中工作,但不能在蜂巢中工作
id name value
1 abc stack
9 xyz stackoverflow
Hive希望使用如下所述的group by子句。
select id,name,value from
(SELECT id,name,value FROM table1
UNION ALL
SELECT id,name,value FROM table2) t
group by value having count(value) = 1;
并给出输出
select id,name,value from
(SELECT id,name,value FROM table1
UNION ALL
SELECT id,name,value FROM table2) t
group by id,name,value having count(value) = 1;
我们将必须在select子句中使用分组的所有列。但是当我给它的时候考虑了所有的列,结果与预期的不同。
答案 0 :(得分:1)
计算解析count(*) over(partition by value)
。
用数据示例进行测试:
with
table1 as (
select stack (4,
1,'abc','stack',
3,'abc','overflow',
4,'abc','foo',
6,'abc','bar'
) as (id, name, value)
),
table2 as (
select stack (4,
5, 'xyz','overflow',
9, 'xyz','stackoverflow',
3, 'xyz','foo',
23, 'xyz','bar'
) as (id, name, value)
)
select id, name, value
from(
select id, name, value, count(*) over(partition by value) value_cnt
from
(SELECT id,name,value FROM table1
UNION ALL
SELECT id,name,value FROM table2) s
)s where value_cnt=1;
结果:
OK
id name value
1 abc stack
9 xyz stackoverflow
Time taken: 55.423 seconds, Fetched: 2 row(s)
答案 1 :(得分:0)
您可以在下面尝试-
seELECT id,name,value FROM table1 a left join table2 b on a.value=b.value
where b.value is null
UNION ALL SELECT
seELECT id,name,value FROM table2 a left join table1 b on a.value=b.value
where b.value is null