Question

我在Windows Server 2008 R2上安装了HDP 1.1 我将web登录加载到hive表中。创建表语句：

create table logtable (datenonQuery string , hours string, minutes string, seconds string, TimeTaken string, Method string, UriQuery string, ProtocolStatus string) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties( "input.regex" = "(\\S+)\\t(\\d+):(\\d+):(\\d+)\\t(\\S+)\\t(\\S+)\\t(\\S+)\\t(\\S+)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s") stored as textfile;

加载声明：

load data local inpath 'D:\Logfiles\' into table logtable;

选择声明：

Select * from logtable;

到目前为止一切正常。

以下陈述失败：

Select count(*) from logtable;

，例外：

失败：执行错误，从org.apache.hadoop.hive.ql.exec.MapRedTask返回代码2

EDIT1：

“失败的作业表”中的诊断信息显示以下信息：

'失败的地图任务数超过了允许的限制。 FailedCount：1。LastFailedTask：task_201306251711_0010_m_000000'

Answer 1

这是与蜂巢有关的事情。 SELECT *工作原因和SELECT COUNT（*）之所以没有，后者涉及MR工作。你的数据是什么？

尝试通过将属性mapred.job.map.memory.mb设置为更高的值来增加映射器堆。同时尝试通过mapred.min.split.size降低分割大小来增加地图制作者的数量，看看它是否有所不同。

Answer 2

如果输出结果集有两列同名（可能在hive / impala中使用），则count（*）将不起作用。

例如查询＃1将给出结果，而查询＃2将给出错误。

解决方案-别名product_code列将解决查询2中的错误

1）选择 a.product_code，b.product_code，b.product_name，a.purchase_date，a.purchase_qty 从 product_fact a 内部连接product_dim b 开启（a.product_code = b.product_code）

2）从（选择 a.product_code，b.product_code，b.product_name，a.purchase_date，a.purchase_qty 从 product_fact a 内部连接product_dim b 开启（a.product_code = b.product_code））作为C

Answer 3

对我来说，这个特殊错误是访问问题。当我使用用户名和密码连接到数据库时，此问题已解决

Hive：SELECT 语句有效但不是SELECT COUNT（）

3 个答案:

Hive：SELECT *语句有效但不是SELECT COUNT（*）

3 个答案:

Hive：SELECT 语句有效但不是SELECT COUNT（）