蜂巢加载bson文件Exception与:org.bson.BsonSerializationException

时间:2019-04-17 11:43:03

标签: linux mongodb hadoop hive bson

这是我的问题,bson文件很大,然后只有1Tb大小,所以我使用了以下分割命令:split -l 3000000 file.bson -d -a 2 file_ 减少bson文件。当我尝试将bson文件转换为蜂巢时,我使用了jar,例如: mongo-hadoop-core-2.0.2.jar,mongo-hadoop-hive-2.0.0.jar,mongo-java-driver-3.2.2.jar 而且,这是我的sql脚本:

        create external table sfim_logs_dump (
        id string
        ,ts bigint
        ,reqts string
        ) 
        row format serde 'com.mongodb.hadoop.hive.bsonserde' 
        with serdeproperties(
       'mongo.columns.mapping' = '{"id":"_id","ts":"ts","reqts":"reqts"}')
        stored as inputformat 'com.mongodb.hadoop.mapred.bsonfileinputformat' 
        outputformat 'com.mongodb.hadoop.hive.output.hivebsonfileoutputformat' 
        location '$hdfspath'

因此,当我尝试select count(1) from sfim_logs_dump;时,bson文件已加载到配置单元中,很快就会出现以下错误:

        org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:267)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:253)
    ... 11 more
Caused by: org.bson.BsonSerializationException: Detected unknown BSON type "\x32" for fieldname ".708545". Are you using the latest driver version?
    at org.bson.BsonBinaryReader.readBsonType(BsonBinaryReader.java:96)
    at org.bson.AbstractBsonWriter.pipeDocument(AbstractBsonWriter.java:799)
    at org.bson.AbstractBsonWriter.pipe(AbstractBsonWriter.java:756)
    at org.bson.BasicBSONDecoder.decode(BasicBSONDecoder.java:48)
    at org.bson.BasicBSONDecoder.decode(BasicBSONDecoder.java:57)
    at com.mongodb.hadoop.splitter.BSONSplitter.getStartingPositionForSplit(BSONSplitter.java:427)
    at com.mongodb.hadoop.mapred.BSONFileInputFormat.getRecordReader(BSONFileInputFormat.java:127)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68)
    ... 16 more

我尝试使用最新的驱动程序版本,例如mongo-java-driver-3.10.2.jar,它无法正常工作并出现相同的错误。谢谢,请告诉我如何解决该问题!

0 个答案:

没有答案