我在Ubuntu 14.0上运行Apache Hadoop 2.6.0,我在Hive 0.13.0中创建了一个表:
CREATE TABLE IF NOT EXISTS recipes_hive.cuisine (
ID INT COMMENT 'Cuisine ID.',
name STRING COMMENT 'Cusine name - primary key.',
area STRING COMMENT 'Name of the area of origin - foreign key.',
scope STRING COMMENT 'Either country or area.')
COMMENT 'Table containing cuisines data.'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
我用声明填充数据:
LOAD DATA LOCAL INPATH 'path_to_file/CUISINE.csv'
OVERWRITE INTO TABLE recipes_hive.cuisine;
我的数据库有几个这样的表都创建并填充相同的过程。运行简单查询时:
SELECT * FROM cuisine
甚至WHERE子句中的某些条件我得到了预期的结果,但运行更复杂的查询我得到了蹲下。例如:
SELECT cuisine.name, SUM(IF (ingredient.category = "fruit",1,2))/count(*) AS PERC
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient
GROUP BY cuisine.name
ORDER BY PERC DESC
,或:
SELECT ingredient.id, ingredient.name
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient
WHERE ingredient.id IN (
SELECT ingredient.id
FROM cuisine c JOIN recipe ON recipe.cuisine = c.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient
WHERE c.name = "Pakistan") AND cuisine.name = "Bangladesh"
第一个示例计算一些百分比,第二个示例检查相互元素。
正确调用MapReduce和Hadoop,它们不会返回任何错误。输出结束于:
Execution completed successfully
MapredLocal task succeeded
OK
Time taken: 122.119 seconds
我检查了网络,人们和我一样有类似的问题。我查了一下:
Hive Table returning empty result set on all queries
但未能解决我的问题。这些数据实际上是在HDFS中,如前所述,它可用于简单查询。
因此,我的Hive实例出现问题或者我的查询编写错误。
非常感谢任何帮助。 最好的问候。
答案 0 :(得分:1)
您确定结果联接是否为非空。因为,您已经实现了内连接,即使一个表缺少记录,整个结果集也是0.尝试使用" IS NULL添加左连接"验证所有表对结果集的贡献。如果所有子表在连接后的各自列中都具有非空值,则查询是好的。
答案 1 :(得分:1)
如果我们有包含ID = {1,2,3}的Cuisine表和包含ID = {5,6,7}的Recipe表,那么即使这些表非空,我们仍然不会返回任何行INNER JOIN Cuisine.ID = Recipe.ID(因为2个表中的ID不同) 请你检查一下是否有这种情况。
SELECT count(1)
FROM cuisine c JOIN recipe ON recipe.cuisine = c.name WHERE c.name = "Pakistan";
--- must return > 0
select count(1) from recipe as recipe
JOIN part_of ON part_of.id_recipe = recipe.id ;
--- must return > 0
select count(1) from part_of as part_of
JOIN ingredient ON ingredient.name = part_of.ingredient ;
--- must return > 0
因此,当所有count(*)都不为零时,内部查询返回一行。现在测试外部选择:
SELECT ingredient.id, ingredient.name
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient
WHERE ingredient.id = <inner query result> and cuisine.name = "Bangladesh";