可以使用配置单元hive.fetch.task.conversion
参数在Hive中启用Fetch任务以进行简单查询而不是Map或MapReduce。
请解释为什么Fetch任务的运行速度比Map快得多,尤其是在做一些简单的工作时(例如select * from table limit 10;
)?在这种情况下,仅地图任务还在做什么?在我的情况下,性能差异快20倍。这两个任务都应该读取表数据,不是吗?
答案 0 :(得分:1)
FetchTask直接提取数据,而Mapreduce将调用map reduce job
<property>
<name>hive.fetch.task.conversion</name>
<value>minimal</value>
<description>
Some select queries can be converted to single FETCH task
minimizing latency.Currently the query should be single
sourced not having any subquery and should not have
any aggregations or distincts (which incurrs RS),
lateral views and joins.
1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
2. more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
</description>
</property>
&#13;
还有另一个参数hive.fetch.task.conversion.threshold
,默认情况下,0.10-0.13为-1,&gt; 0.14为1G(1073741824)
这表明,如果表大小大于1G,请使用Mapreduce而不是Fetch task