查询未找到列,建议在Hive SQL中使用相同的列

时间:2017-08-18 09:15:00

标签: sql hive rank

我在SQL中有以下查询:

select midquery.account, midquery.name, midquery.label,  midquery.labelfrequency
from(

    -- Count the appearance of each label.

    select count(*) as labelfrequency, account, name, label
    from(

        select account, name, label from myTable 

    ) innerquery

    group by account, name, label
) midquery

-- Select most frequent values only.
where rank() over 
    (partition by midquery.account, midquery.name 
     order by midquery.labelfrequency desc) = 1     

我们的想法是找到每个名称帐户集最常用的标签。当我运行此查询时,出现以下错误:

Error while compiling statement: FAILED: SemanticException [Error 10002]: Line 12:74 Invalid column reference 'labelfrequency': (possible column names are: labelfrequency, account, name, label)

我不太明白为什么口译员没有找到列实验室,但可以提出建议。您对如何解决这个问题有任何建议吗?

修改:如果我将rank()移到select部分,我会得到结果。

select midquery.account, midquery.name, midquery.label,  midquery.labelfrequency, 
    rank() over (partition by midquery.account, midquery.name 
     order by midquery.labelfrequency desc)
from(

    -- Count the appearance of each label.

    select count(*) as labelfrequency, account, name, label
    from(

        select account, name, label from myTable 

    ) innerquery

    group by account, name, label
) midquery

1 个答案:

答案 0 :(得分:1)

WHERE子句中根本不允许使用窗口函数。这有很好的理由,但您可以将其视为SQL的另一个规则 - 类似于无法识别的列别名。

(真正的原因是指定当有多个过滤条件时窗函数将如何运作。(几乎?)不可能提出一套连贯的规则。)

话虽如此,您可以简化查询:

select t.account, t.name, t.label, t.labelfrequency
from (select count(*) as labelfrequency, account, name, label,
             rank() over (partition by account, name
                          order by count(*) desc
                         ) as seqnum
      from myTable t
      group by account, name, label
     ) t
where seqnum = 1;

即,可以组合窗口函数和聚合函数。并且您不需要子查询来仅指定少数列。