首先,我是这个论坛的新手,我是镶木地板的新手,并试图了解它的细节。编写了一个java Map Reduce代码来验证谓词&列投影。这是我面临的挑战,我需要你的专家帮助。
谓词投影代码:
public static class PushDown implements UnboundRecordFilter {
private final UnboundRecordFilter filter;
public PushDown()
{
filter = ColumnRecordFilter.column("Age",ColumnPredicates.equalTo(35));
}
@Override
public RecordFilter bind(Iterable<ColumnReader> readers) {
return filter.bind(readers);
}
}
调用谓词投影:
AvroParquetInputFormat.setUnboundRecordFilter(job,PushDown.class);
出现以下错误:
Error: parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://quickstart.cloudera:8020/parq/customer/wocomp/part-m-00000.parquet
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:241)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.UnsupportedOperationException
at parquet.column.impl.ColumnReaderImpl$Binding.getBinary(ColumnReaderImpl.java:118)
at parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:417)
at parquet.filter.ColumnPredicates$1.apply(ColumnPredicates.java:67)
at parquet.filter.ColumnRecordFilter.isMatch(ColumnRecordFilter.java:72)
at parquet.io.FilteredRecordReader.skipToMatch(FilteredRecordReader.java:80)
at parquet.io.FilteredRecordReader.read(FilteredRecordReader.java:60)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:216)
... 12 more
其他信息:
我可以使用以下代码验证列投影,我可以看到Results like this。这就像this without column projection
Schema projection = Schema.createRecord("AvgAge", "", "", false);
List<Schema.Field> fields = new ArrayList<Schema.Field>();
fields.add(new Schema.Field("Age", Schema.create(Schema.Type.LONG), "",""));
projection.setFields(fields);
AvroParquetInputFormat.setRequestedProjection(job, projection);
问题:
Q1。在上面的代码中,需要更正哪些过滤器按下才能工作?
Q2。我没有在日志中看到太多信息来验证投影。您能帮助我们验证过滤器/列投影吗?
感谢您的时间并尝试提供帮助。谢谢。