HIVEQL:为什么我的SELECT查询不起作用?

时间:2016-07-22 16:00:52

标签: sql select join hive hiveql

背景

我正在使用Hive并希望将Query_1与Query_2合并。两者都是分开工作:

--> Query_1
SELECT DISTINCT
table_a.number,
table_a.country,
table_a.brand
FROM db_y.table_b,db_x.table_a
WHERE table_a.date = '20160718'
AND CAST (table_a.brand as DOUBLE) IS NOT NULL
AND table_a.number = table_b.number
AND table_a.country = table_b.country
AND table_a.brand = table_b.brand
991     413     7040482
991     413     7040484
991     413     7040486


--> Query_2
 SELECT DISTINCT
    table_a.number,
    table_a.country,
    table_a.brand
    FROM db_x.table_a,db_x.table_c
    WHERE table_a.date = '20160719'
    AND table_a.brand = substring(table_c.brand,2,7)
    AND table_a.country = substring(table_c.country,2,3)
    AND table_a.number = substring(table_c.number,2,3)
    907     298     0004130  --> found in table_b
    907     298     0004138
    907     410     7024257

问题:

下面,合并查询Query_3不起作用,为什么?

 --> Query_3
 SELECT DISTINCT
    table_a.number,
    table_a.country,
    table_a.brand
    FROM db_y.table_b,db_x.table_a,db_x.table_c
    WHERE table_a.date = '20160718'
    AND table_a.number = table_b.number
    AND table_a.country = table_b.country
    AND table_a.brand = table_b.brand
    AND table_b.brand = substring(table_c.brand,2,7)
    AND table_b.country = substring(table_c.country,2,3)
    AND table_b.number = substring(table_c.number,2,3);

以下是Query_3的替换查询:

   SELECT DISTINCT
    table_a.number,
    table_a.country,
    table_a.brand
    FROM db_x.table_a,( SELECT DISTINCT
    table_b.number,
    table_b.country,
    table_b.brand
    FROM db_y.table_b,db_x.table_c
    WHERE table_b.brand = substring(table_c.brand,2,7)
    AND table_b.country = substring(table_c.country,2,3)
    AND table_b.number = substring(table_c.number,2,3) ) subq
    WHERE table_a.date = '20160718'
    AND table_a.number = subq.number
    AND table_a.country = subq.country
    AND table_a.brand = subq.brand;

但我真的想了解Query_3错误的原因。

信息:

  • 在我的计算机上,它在减少步骤
  • 时阻止了96%
  • 在我朋友的一个(比我的容量更好),返回0结果(我们期待结果)

谢谢。

1 个答案:

答案 0 :(得分:0)

在您的第一个查询3中,您说的是

table_a.country = table_b.country
and table_b.country = substring(table_c.country,2,3)

这基本上意味着table_a.country = substring(table_c.country,2,3)。

在第二个查询3中,您只是将table_a.country与table_b.country进行比较。派生表将table_b.country连接到substring(table_c.country,2,3),但它只返回table_b.country。这就是您要加入table_a.country的列。同样适用于table_c中您正在进行子串的所有列。

我希望所有这些都有意义......