这是我的问题,bson文件很大,然后只有1Tb大小,所以我使用了以下分割命令:split -l 3000000 file.bson -d -a 2 file_
减少bson文件。当我尝试将bson文件转换为蜂巢时,我使用了jar,例如:
mongo-hadoop-core-2.0.2.jar,mongo-hadoop-hive-2.0.0.jar,mongo-java-driver-3.2.2.jar
而且,这是我的sql脚本:
create external table sfim_logs_dump (
id string
,ts bigint
,reqts string
)
row format serde 'com.mongodb.hadoop.hive.bsonserde'
with serdeproperties(
'mongo.columns.mapping' = '{"id":"_id","ts":"ts","reqts":"reqts"}')
stored as inputformat 'com.mongodb.hadoop.mapred.bsonfileinputformat'
outputformat 'com.mongodb.hadoop.hive.output.hivebsonfileoutputformat'
location '$hdfspath'
因此,当我尝试select count(1) from sfim_logs_dump;
时,bson文件已加载到配置单元中,很快就会出现以下错误:
org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:267)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:253)
... 11 more
Caused by: org.bson.BsonSerializationException: Detected unknown BSON type "\x32" for fieldname ".708545". Are you using the latest driver version?
at org.bson.BsonBinaryReader.readBsonType(BsonBinaryReader.java:96)
at org.bson.AbstractBsonWriter.pipeDocument(AbstractBsonWriter.java:799)
at org.bson.AbstractBsonWriter.pipe(AbstractBsonWriter.java:756)
at org.bson.BasicBSONDecoder.decode(BasicBSONDecoder.java:48)
at org.bson.BasicBSONDecoder.decode(BasicBSONDecoder.java:57)
at com.mongodb.hadoop.splitter.BSONSplitter.getStartingPositionForSplit(BSONSplitter.java:427)
at com.mongodb.hadoop.mapred.BSONFileInputFormat.getRecordReader(BSONFileInputFormat.java:127)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68)
... 16 more
我尝试使用最新的驱动程序版本,例如mongo-java-driver-3.10.2.jar
,它无法正常工作并出现相同的错误。谢谢,请告诉我如何解决该问题!