在Apache Pig中使用python udf的outputschema

时间:2019-02-04 02:44:03

标签: apache-pig

我的python udf返回一个元组列表,如下所示:

[(0.01, 12), (0.02, 6), (0.03, 12), (0.04, 19), (0.05, 29), (0.06, 42)]

以上内容已打印到映射器的标准输出中,并从中复制。

元组中的两个值分别转换为float和int。我还打印了该类型,并且确实正确地进行了强制转换。

(<type 'float'>, <type 'int'>)

这里是装饰器@outputSchema("stats:bag{improvement:tuple(percent:float,entityCount:int)}")

这是错误消息:

  

错误:java.io.IOException:   org.apache.avro.file.DataFileWriter $ AppendWriteException:   java.lang.RuntimeException:基准(0.01,12)未合并   [“ null”,{“ type”:“记录”,“ name”:“ TUPLE_1”,“ fields”:[{“ name”:“ percent”,“ type”:[“ null”,“ float”], “ doc”:“自动生成   来自猪场   模式“,”默认“:null},{”名称“:” entityCount“,”类型“:[” null“,” int“],” doc“:”自动生成   来自Pig Field Schema“,” default“:null}]}]]   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce $ Reduce.runPipeline(PigGenericMapReduce.java:479)   在   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce $ Reduce.processOnePackageOutput(PigGenericMapReduce.java:442)   在   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce $ Reduce.reduce(PigGenericMapReduce.java:422)   在   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce $ Reduce.reduce(PigGenericMapReduce.java:269)   在org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)处   org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)   在org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)处   org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:164)在   java.security.AccessController.doPrivileged(本机方法),位于   javax.security.auth.Subject.doAs(Subject.java:422)在   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)   在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)导致   创建人:org.apache.avro.file.DataFileWriter $ AppendWriteException:   java.lang.RuntimeException:基准(0.01,12)未合并   [“ null”,{“ type”:“记录”,“ name”:“ TUPLE_1”,“ fields”:[{“ name”:“ percent”,“ type”:[“ null”,“ float”], “ doc”:“自动生成   来自猪场   模式“,”默认“:null},{”名称“:” entityCount“,”类型“:[” null“,” int“],” doc“:”自动生成   来自Pig Field Schema“,” default“:null}]}]]   org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)在   org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)   在   org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:646)   在   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat $ PigRecordWriter.write(PigOutputFormat.java:136)   在   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat $ PigRecordWriter.write(PigOutputFormat.java:95)   在   org.apache.hadoop.mapred.ReduceTask $ NewTrackingRecordWriter.write(ReduceTask.java:558)   在   org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)   在   org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer $ Context.write(WrappedReducer.java:105)   在   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce $ Reduce.runPipeline(PigGenericMapReduce.java:477)   ... 11更多原因:java.lang.RuntimeException:基准(0.01,12)是   不联合   [“ null”,{“ type”:“记录”,“ name”:“ TUPLE_1”,“ fields”:[{“ name”:“ percent”,“ type”:[“ null”,“ float”], “ doc”:“自动生成   来自猪场   模式“,”默认“:null},{”名称“:” entityCount“,”类型“:[” null“,” int“],” doc“:”自动生成   来自Pig Field Schema“,” default“:null}]}]]   org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.resolveUnion(PigAvroDatumWriter.java:132)   在   org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:111)   在   org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82)   在   org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:131)   在   org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)   在   org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)   在   org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:113)   在   org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82)   在   org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:378)   在   org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)   在   org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)   在   org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)   在org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)   ...

有人知道我在模式中做错了什么吗?

0 个答案:

没有答案