Question

我是hadoop＆amp; amp;的新手蜂巢。您能否建议在cloudera 5.2.1上运行Apache Hive是否有任何性能调整步骤。

为了提高配置单元查询性能，有哪些调整参数

Hive版本： - Hive 0.13.1-cdh5.2.1

Hive查询： -

选择不同的a1.chain_number chain_number， a1.chain_description chain_description 来自staff.organization_hierarchy a1;

Hive表创建为外部，带有选项＆＃34; STORED as TEXT FORMAT＆＃34;和表属性如下： -

更改以下蜂巢设置后，我们看到了10秒的改善

设置hive.exec.parallel = true;

请您建议除上述之外的任何其他设置，以提高我正在使用的查询类型的配置单元查询性能。

Answer 1

您可以使用group by替换distinct，因为distinct作业只有1个减少作业。

试试这个

 select chain_number, chain_description 
 from staff.organization_hierarchy
 group by chain_number, chain_description

如果减少的工作号仍然很小。您可以使用mapred.reduct.tasks配置

来具体说明

Answer 2

optimize Hive performance不仅有很多方法 1）启用Tez执行引擎。 2）使用ORC文件格式 3）使用矢量化 4）基于成本的优化 5）使用适当的HQL命令等等。