多个插入覆盖到多个表查询在所有临时表中存储相同的结果

时间:2012-11-08 12:52:13

标签: hadoop hive

我正在使用多个INSERT OVERWRITE对多个表进行查询,以便扫描 数据集只有一次,我最终拥有相同内容的所有这些表!它似乎 返回结果的GROUP BY查询将覆盖所有临时表。

这是行为不端的问题:

FROM nikon
INSERT OVERWRITE TABLE e1
SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
INSERT OVERWRITE TABLE e2
SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
;

它只启动一个MR作业,结果如下。为什么表'e1'包含结果 从表'e2'?表'e1'应为空(参见下面的各个SELECT)

hive> SELECT * from e1;
OK
NULL    2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 0.229 seconds

hive> SELECT * from e2;
OK
NULL    2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 0.11 seconds

以下是单个查询的结果(只有第二个查询返回结果集):

hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM nikon
WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
(...)
OK
      <- There are no results, this is normal
Time taken: 41.471 seconds

hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
(...)
OK
NULL  2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 39.607 seconds

1 个答案:

答案 0 :(得分:1)

我创建了HIVE-3699的jira问题已修复。补丁在那里可用,应该与hive-0.11合并。 Cloudera 4.2-rc包含补丁