运行以下Hive查询将返回特殊字符:
SELECT t6.amt amt2,t6.color color
FROM(
SELECT t5.color color, t5.c1 amt
FROM(
SELECT t1.c1 c1, t1.c2 AS color
from(
SELECT 7716 AS c1, "Red" AS c2 UNION
SELECT 6203 AS c1, "Blue" AS c2
) t1
) t5
order by color) t6
ORDER BY color
它将结果返回为
amt color
4 �
3 �
这是一个已知的蜂巢错误吗?
说明计划
Map 5 <- Union 2 (CONTAINS)
Reducer 3 <- Union 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 4
File Output Operator [FS_331359]
compressed:false
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
Select Operator [SEL_331358]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
|<-Reducer 3 [SIMPLE_EDGE]
Reduce Output Operator [RS_331357]
key expressions:_col1 (type: int)
sort order:+
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
value expressions:_col0 (type: string)
Select Operator [SEL_331351]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator [GBY_331350]
| keys:KEY._col0 (type: int), KEY._col1 (type: string)
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
|<-Union 2 [SIMPLE_EDGE]
|<-Map 1 [CONTAINS]
| Reduce Output Operator [RS_331349]
| key expressions:_col0 (type: int), _col1 (type: string)
| Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
| sort order:++
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
| Group By Operator [GBY_331348]
| keys:_col0 (type: int), _col1 (type: string)
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
| Select Operator [SEL_331342]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 91 Basic stats: COMPLETE Column stats: COMPLETE
| TableScan [TS_331341]
| alias:_dummy_table
| Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
|<-Map 5 [CONTAINS]
Reduce Output Operator [RS_331349]
key expressions:_col0 (type: int), _col1 (type: string)
Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
sort order:++
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator [GBY_331348]
keys:_col0 (type: int), _col1 (type: string)
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator [SEL_331344]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
TableScan [TS_331343]
alias:_dummy_table
Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
在这里禁用或启用配置参数可以帮助我吗?
如果我颠倒最外层选择中的列顺序,则查询将返回预期结果。我本来希望结果是
颜色amt
蓝色6203
红色7716
答案 0 :(得分:1)
我在Hive 2.3上使用MR和Tez尝试了相同的查询,结果与您的相同。我关闭了所有查询优化,统计信息收集和rcp,但结果保持不变。问题是Hive在单个reducer上制作order by
,并且由于您有两个连续的order by
,因此Hive会将它们合并到单个reduce阶段(很容易看出您是外观还是扩展或格式化查询计划)。更准确地说,Hive使用_col0, _col1
等作为列别名,在t5
子查询中,您的键是_col0
,但是在t6
中,这是_col1
,这就是选择运算符的原因您看到
expressions:: "_col1 (type: string), _col0 (type: int)"
并在reduce输出运算符中
key expressions:: "_col1 (type: int)"
因此,请介绍一些在交换选择列时如何切换键的类型。如果类型顺序在t5和t6中相同,则没有问题
key expressions:: "_col0 (type: string)"
如何避免这种情况-我真的不知道在单个reducer中进行顺序order by
并不是因为进行了额外的优化。