我在运行MapReduce时遇到异常。
我正在使用Elasticsearch 2.1和Elasticsearch Hadoop 2.2.0
f1
的类型为byte
$ curl -XGET http://hostname:9200/index-name/?pretty
...
"f1": {
"type": "byte"
}
...
其中一个文档在f1
字段中的值为20。
$ curl -XGET http://hostname:9200/index-name/type-name/doc-id?pretty
...
"f1": 20
...
但我犯了这样的错误:
$ curl -XPOST http://hostname:9200/index-name/type-name/doc-id/_update -d '
{
"script": "ctx._source.f1 += \"10\";",
"upsert": {
"f1": 20
}
}'
现在,f1
变为2010
,不适合byte
$ curl -XGET http://hostname:9200/index-name/type-name/doc-id?pretty
...
"f1": "2010"
...
最后,ES Hadoop抛出了NumberFormatException
INFO mapreduce.Job: Task Id : attempt_1454640755387_0404_m_000020_2, Status : FAILED
Error: org.elasticsearch.hadoop.rest.EsHadoopParsingException: Cannot parse value [2010] for field [f1]
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:701)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:794)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:692)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:457)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:382)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:277)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:250)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:456)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:298)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:232)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NumberFormatException: Value out of range. Value:"2030" Radix:10
at java.lang.Byte.parseByte(Byte.java:150)
at java.lang.Byte.parseByte(Byte.java:174)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.parseByte(JdkValueReader.java:333)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.byteValue(JdkValueReader.java:325)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.readValue(JdkValueReader.java:67)
at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:714)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:699)
... 21 more
我想忽略格式错误的文档,该文档会抛出NumberFormat Exception并希望继续MapReduce。
根据SO Answer,我使用Mapper.map()
阻止了try-catch
方法。但它对我没有帮助。
感谢。
答案 0 :(得分:0)
Elasticsearch Hadoop的作者说:
ES-Hadoop不是映射器 - 而是在M / R中可用作输入/输出格式。问题不在于映射器,而是发送到ES的数据。 ES-Hadoop目前没有选择忽略错误,因为它是快速失败的 - 如果出现问题,它会马上解决。 但是,您可以在错误数据到达ES之前对其进行过滤。