Spark和Hive互操作性

时间:2016-05-19 04:24:47

标签: hadoop apache-spark nullpointerexception hive parquet

我使用的是EMR-4.3.0,Spark 1.6.0,Hive 1.0.0。

我写了一个像这样的表(伪代码) -

val df = <a dataframe>
df.registerTempTable("temptable")
sqlContext.setConf("hive.exec.dynamic.partition.mode","true")
sqlContext.sql("create external table exttable ( some columns ... )
partitioned by (partkey int) stored as parquet location 's3://some.bucket'")
sqlContext.sql("insert overwrite exttable partition(partkey) select columns from 
temptable")

写入工作正常,我可以使用 -

读取表格
sqlContext.sql("select * from exttable")

但是,当我尝试使用Hive作为 -

读取表格时
hive -e 'select * from exttable'

Hive抛出NullPointerException,下面是堆栈跟踪。任何帮助赞赏! -

questTime = [0.008],ResponseProcessingTime = [0.295],HttpClientSendRequestTime = [0.313], 2016-05-19 03:08:02,537 ERROR [main()]:CliDriver(SessionState.java:printError(833)) - 失败,异常java.io.IOException:java.lang.NullPointerException java.io.IOException:java.lang.NullPointerException         在org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)         在org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)         在org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)         在org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1619)         在org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:221)         在org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153)         在org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364)         在org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)         在org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631)         在org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:606)         在org.apache.hadoop.util.RunJar.run(RunJar.java:221)         在org.apache.hadoop.util.RunJar.main(RunJar.java:136) 引起:java.lang.NullPointerException         在parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:247)         在parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:368)         at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:346)         在parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:296)         在parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:254)         在org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:200)         在org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper。(ParquetRecordReaderWrapper.java:79)         在org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper。(ParquetRecordReaderWrapper.java:66)         at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)         在org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:498)         在org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:588)         ......还有15个

UPDATE - 搞砸了一下之后,似乎数据混乱中的空值Hive up。我该如何避免这种情况?

0 个答案:

没有答案