Question

我在hive中创建了两个表，如下所示

create table test1(id string);

create table test2(id string);

test1的值如下所示

1

test2的值如下所示

1

当我加入这两个表时，我得到了输出

1

这是使用的查询：

select a.id from test1 a,test2 b where a.id=b.id;

请帮助我预期输出为

1

我正在使用cloudera发行版

Answer 1

更好地使用ANSI连接语法：

select a.id 
  from test1 a 
       inner join test2 b on a.id=b.id

预期输出不能是您加入的结果，因为对于每个a.id，选择了来自a和b的所有匹配行。对于a的第一行，它将是b中的两个匹配行。对于a的第二行，它也是来自b的两个匹配行。所以它将完全是四行。

例如，您可以在加入之前将distinct应用于第二个表。

select a.id 
  from test1 a 
       inner join (select distinct b.id from test2 b) b on a.id=b.id

在这种情况下，对于表a中的每一行，它将是表b中的单个匹配行。