即使创建了TextArrayWritable来实现它,我仍然遇到错误java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.<init>()
我有一个数据集,每个条目的格式如下: (((a,b,c),(d,e,f,g))。它们是pyspark中的元组,并通过以下方式保存到顺序文件中:
output.saveAsSequenceFile(
path=os.path.join(output_path, 'date=%s' % date),
compressionCodecClass='org.apache.hadoop.io.compress.SnappyCodec'
)
现在我想通过说来将它们加载到Java中
JavaPairRDD<TextArrayWritable,TextArrayWritable> distFile = sc.sequenceFile(s3inputPath.toString(), TextArrayWritable.class, TextArrayWritable.class);
其中TextArrayWritable继承了ArrayWritable,请参见下文:
public static class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {
super(Text.class);
}
public TextArrayWritable(String[] strings) {
super(Text.class);
Text[] texts = new Text[strings.length];
for (int i = 0; i < strings.length; i++) {
texts[i] = new Text(strings[i]);
}
set(texts);
}
}
不幸的是,我遇到了一个错误,说java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.<init>()
有人可以帮我吗?
谢谢!