这是我发送给HIVE的查询类型:
SELECT BigTable.nicefield,LargeTable.*
FROM LargeTable INNER JOIN BigTable
ON (
LargeTable.joinfield1of4 = BigTable.joinfield1of4
AND LargeTable.joinfield2of4 = BigTable.joinfield2of4
)
WHERE LargeTable.joinfield3of4=20140726 AND LargeTable.joinfield4of4=15 AND BigTable.joinfield3of4=20140726 AND BigTable.joinfield4of4=15
AND LargeTable.filterfiled1of2=123456
AND LargeTable.filterfiled2of2=98765
AND LargeTable.joinfield2of4=12
AND LargeTable.joinfield1of4='iwanttolikehive'
返回2418025
行。问题在于
SELECT *
FROM LargeTable
WHERE joinfield3of4=20140726 AND joinfield4of4=15
AND filterfiled1of2=123456
AND filterfiled2of2=98765
AND joinfield2of4=12
AND joinfield1of4='iwanttolikehive'
返回1555
行,同样如下:
SELECT *
FROM BigTable
WHERE joinfield3of4=20140726 AND joinfield4of4=15
AND joinfield2of4=12
AND joinfield1of4='iwanttolikehive'
请注意 1555 ^ 2 = 2418025 。
答案 0 :(得分:2)
事实证明,查询的正确版本应为:
SELECT bt.nicefield,LargeTable.*
FROM LargeTable INNER JOIN
(
SELECT nicefield, joinfield1of4,joinfield2of4, count(*) as rows
FROM BigTable
WHERE joinfield3of4=20140726 ANDjoinfield4of4=15
GROUP BY nicefield, joinfield1of4,joinfield2of4
) bt
ON (
LargeTable.joinfield1of4 = bt.joinfield1of4
AND LargeTable.joinfield2of4 = bt.joinfield2of4
)
WHERE LargeTable.joinfield3of4=20140726 AND LargeTable.joinfield4of4=15
AND LargeTable.filterfiled1of2=123456
AND LargeTable.filterfiled2of2=98765
AND LargeTable.joinfield2of4=12
AND LargeTable.joinfield1of4='iwanttolikehive'
问题是在原始查询中,BigTable
上的联接返回了重复项。
这不是问题,查询只需要仔细阅读! 我希望这有帮助!