Question

我正在尝试运行以下查询：

select [every_column],count(*) from <table> group by [every_column] having count(*) >1

但是列名称应在同一查询中派生。我相信show中的列会列出用新行分隔的列名称。但是我需要在一个查询中使用它来检索结果。感谢这方面的帮助。

Answer 1

您是否考虑过使用子查询甚至CTE？也许这可以帮助您找到答案：

select outer.col1,
       outer.col2,
       outer.col3,
       outer.col4,
       outer.col5,
       outer.col6, count(*) as cnt
from (
        select <some logic> as col1,
               <some logic> as col2,
               <some logic> as col3,
               <some logic> as col4,
               <some logic> as col5,
               <some logic> as col6
        from innerTable
)outer
group by outer.col1,
       outer.col2,
       outer.col3,
       outer.col4,
       outer.col5,
       outer.col6

Answer 2

您可以使用 shell sed 搜索newlines(\n)并替换为comma(,)。

将逗号分隔的列名称分配给hive variable，在配置单元查询中使用变量名称。

sed和set hive variables的引用

Hive-如何获取派生列名称并在同一查询中使用它？

2 个答案: