问题是侧视图爆炸不能在spark shell中的hiveContext中工作。以下是样本表&样本火花代码。 火花框的预期输出" vasOtherDF"是6,但它给出了8。
蜂巢表:
CREATE EXTERNAL TABLE `aa`(
`col1` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://nn1001.dev:8020/tmp/aa'
示例数据:
aaa|qq|ww|dd
aaa
aaa|bbb
ccc
Hive输出:
select count(distinct vother) as vothers from aa LATERAL VIEW explode(split(col1,'\\|')) a as vother;
6
select distinct vother as vothers from rafm.aa LATERAL VIEW explode(split(col1,'\\|')) a as vother;
aaa
bbb
ccc
dd
qq
ww
Spark输出:
val vasOtherDF = hiveContext.sql("select count(distinct vother) as vothers from aa LATERAL VIEW explode(split(col1,'\\|')) a as vother")
output: 8
select distinct vother as vothers from rafm.aa LATERAL VIEW explode(split(col1,'\\|')) a as vother;
aaa
bbb
ccc
dd
qq
ww
val vasOtherDF = hiveContext.sql("select distinct vother as vothers from aa LATERAL VIEW explode(split(col1,'\\|')) a as vother")
scala> vasOtherDF.show
+-------+
|vothers|
+-------+
| a|
| b|
| c|
| d|
| q|
| w|
| ||
| |
+-------+