谓词预测不适用于镶木地板

时间:2016-02-29 08:18:10

标签: parquet

首先,我是这个论坛的新手,我是镶木地板的新手,并试图了解它的细节。编写了一个java Map Reduce代码来验证谓词&列投影。这是我面临的挑战,我需要你的专家帮助。

谓词投影代码:

    public static class PushDown implements UnboundRecordFilter {
    private final UnboundRecordFilter filter;
    public PushDown() 
    {
        filter = ColumnRecordFilter.column("Age",ColumnPredicates.equalTo(35));         
    }
    @Override
    public RecordFilter bind(Iterable<ColumnReader> readers) {
        return filter.bind(readers);
    }   
}

调用谓词投影:

  

AvroParquetInputFormat.setUnboundRecordFilter(job,PushDown.class);

出现以下错误:

Error: parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://quickstart.cloudera:8020/parq/customer/wocomp/part-m-00000.parquet
    at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:241)
    at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.UnsupportedOperationException
    at parquet.column.impl.ColumnReaderImpl$Binding.getBinary(ColumnReaderImpl.java:118)
    at parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:417)
    at parquet.filter.ColumnPredicates$1.apply(ColumnPredicates.java:67)
    at parquet.filter.ColumnRecordFilter.isMatch(ColumnRecordFilter.java:72)
    at parquet.io.FilteredRecordReader.skipToMatch(FilteredRecordReader.java:80)
    at parquet.io.FilteredRecordReader.read(FilteredRecordReader.java:60)
    at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:216)
    ... 12 more

其他信息:

我可以使用以下代码验证列投影,我可以看到Results like this。这就像this without column projection

Schema projection = Schema.createRecord("AvgAge", "", "", false);
        List<Schema.Field> fields = new ArrayList<Schema.Field>();
        fields.add(new Schema.Field("Age", Schema.create(Schema.Type.LONG), "",""));
        projection.setFields(fields); 

        AvroParquetInputFormat.setRequestedProjection(job, projection);

问题:

Q1。在上面的代码中,需要更正哪些过滤器按下才能工作?

Q2。我没有在日志中看到太多信息来验证投影。您能帮助我们验证过滤器/列投影吗?

感谢您的时间并尝试提供帮助。谢谢。

0 个答案:

没有答案