如何根据计数比较Hive中的两个表

时间:2017-11-01 04:13:10

标签: hive

我有以下蜂巢表

Table_1
ID
1
1
2

Table_2
ID
1
2
2

我根据两个表中的ID计数比较两个表,我需要输出如下

ID 
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2

Table_1是父表

我正在使用以下查询

select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;

3 个答案:

答案 0 :(得分:0)

只需对查询执行完全外连接,并将on条件设置为X.id = Y.id,然后从结果表中选择*,检查任意一方的空值。

Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)

答案 1 :(得分:0)

试试这个。您可以使用案例陈述来检查它是否应该是记录/记录 s 等。

 SELECT m.id,
       CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
       ' record in table 2')
FROM   (SELECT  id
        FROM   table_1
        UNION
        SELECT id
        FROM   table_2) m
       LEFT JOIN (SELECT Count(*) AS ct,
                         id
                  FROM   table_1
                  GROUP  BY id) a
              ON m.id = a.id
       LEFT JOIN (SELECT Count(*) AS ct,
                         id
                  FROM   table_2
                  GROUP  BY id) b
              ON m.id = b.id;  

答案 2 :(得分:0)

您可以使用此Python程序对2个Hive表进行完整比较: https://github.com/bolcom/hive_compared_bq

如果你想根据计数进行快速比较,那么传递“--just-count”选项(你也可以用“ - group-by-column”列指定分组)。

如果您想要完整的验证,该脚本还允许您直观地查看所有行和所有列的所有差异。