Hadoop - Hive子查询 - 不在子句中

时间:2016-12-09 19:15:49

标签: hadoop hive subquery

我试图在Hive上运行以下查询:

SELECT COUNT(*)
FROM mydata
WHERE store NOT IN (SELECT store_out
                    FROM ( SELECT a.store as store_out, COUNT(*) AS CNT
                             FROM mydata a
                             GROUP BY store) TB1
                    WHERE CNT > AVG(CNT) + STDDEV(CNT) AND  CNT < AVG(CNT) - STDDEV(CNT))

但我收到以下错误:

Error while compiling statement: FAILED: SemanticException [Error 10249]: Line 3:6 Unsupported SubQuery Expression 'store': Correlating expression cannot contain unqualified column references.

如何以其他方式编写此查询?

谢谢!

1 个答案:

答案 0 :(得分:1)

我没有确切的数据,因此很难对此进行验证,但我会做类似的事情

SELECT COUNT(*)
FROM (
  SELECT a.*
    , flg
  FROM mydata a
  LEFT OUTER JOIN (
    SELECT store_out, flg
    FROM (
      SELECT store_out
        , cnt
        , 1 AS flg
        , AVG(cnt)         OVER () AS avg_cnt
        , STDDEV_SAMP(cnt) OVER () AS std_cnt
      FROM (
        SELECT store AS store_out
          , COUNT(*) AS cnt
        FROM mydata
        GROUP BY store ) x
      ) y
    WHERE cnt > avt_cnt + std_cnt AND cnt < avg_cnt - std_cnt ) z
  ON a.store = z.store_out ) final
WHERE flg IS NULL

基本上,左边连接子查询并创建一个虚拟列。该列不会存在于主表中,因此所有flg值都为NULL,这些是您想要的存储。希望这会有所帮助。