将不同的SQL脚本合并到一个脚本中

时间:2017-06-06 14:20:14

标签: sql hive

我有3个查询脚本在同一个表上工作;一个是计算总行数,另一个是计算重复行的总数,另一个是计算不同(唯一)行的总数。

计算总行数:

select count(*) as total_rows from <table_name>

计算重复的行:

select COUNT (<column_name>) as total_duplicate from <table_name> group by (<column_name>) HAVING count(<column_name>) > 1

计算唯一行:

select COUNT(DISTINCT <column_name>) as total_unique from <table_name>

我正在尝试将所有3个合并到一个脚本中。我可以做总计数和独特计数;但是,我不知道如何包含重复计数查询,因为它有一个group by语句。

请知道。

3 个答案:

答案 0 :(得分:2)

尝试这样做:

select count(*) as total_rows,COUNT(DISTINCT <column_name>) as total_unique, 
    (select COUNT (<column_name>) 
     from <table_name> 
     group by (<column_name>) 
     HAVING count(<column_name>) > 1) as total_duplicate
from <table_name>

答案 1 :(得分:1)

select      sum(rows_per_col)   as total_rows
           ,count (is_dup)      as total_duplicate
           ,count(*)            as total_unique

from       (select      column_name
                       ,case when count(*) > 1 then 1 end   as is_dup
                       ,count(*)                            as rows_per_col

            from        table_name

            group by    column_name
            ) t

演示

with table_name as (select explode(array(1,1,1,2,3,3,4,4,4,4,4,5)) as column_name)

select      sum(rows_per_col)   as total_rows
           ,count (is_dup)      as total_duplicate
           ,count(*)            as total_unique

from       (select      column_name
                       ,case when count(*) > 1 then 1 end   as is_dup
                       ,count(*)                            as rows_per_col

            from        table_name

            group by    column_name
            ) t
+------------+-----------------+--------------+
| total_rows | total_duplicate | total_unique |
+------------+-----------------+--------------+
|         12 |               3 |            5 |
+------------+-----------------+--------------+

答案 2 :(得分:0)

你可以将这些结合在一起:

 select count(*) as count_total
        'total_rows' as count_type
 from <table_name>

 union

 select count (<column_name>)as count_total
        'total_duplicate' as count_type
 from <table_name>
 group by (<column_name>)
 HAVING count(<column_name>) > 1

 union

 select COUNT(DISTINCT <column_name>) as count_total
        'total_unique' as count type
 from <table_name>