顺序数据中的火花加载-java.lang.NoSuchMethodException:org.apache.hadoop.io.ArrayWritable。<init>()

时间:2019-04-28 00:07:35

标签: apache-spark java-pair-rdd

即使创建了TextArrayWritable来实现它,我仍然遇到错误java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.<init>()

我有一个数据集,每个条目的格式如下: (((a,b,c),(d,e,f,g))。它们是pyspark中的元组,并通过以下方式保存到顺序文件中:

output.saveAsSequenceFile(
        path=os.path.join(output_path, 'date=%s' % date),
        compressionCodecClass='org.apache.hadoop.io.compress.SnappyCodec'
    )

现在我想通过说来将它们加载到Java中

JavaPairRDD<TextArrayWritable,TextArrayWritable> distFile = sc.sequenceFile(s3inputPath.toString(), TextArrayWritable.class, TextArrayWritable.class);

其中TextArrayWritable继承了ArrayWritable,请参见下文:

public static class TextArrayWritable extends ArrayWritable {
        public TextArrayWritable() {
            super(Text.class);
        }

        public TextArrayWritable(String[] strings) {
            super(Text.class);
            Text[] texts = new Text[strings.length];
            for (int i = 0; i < strings.length; i++) {
                texts[i] = new Text(strings[i]);
            }
            set(texts);
        }
    }

不幸的是,我遇到了一个错误,说java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.<init>()

有人可以帮我吗?

谢谢!

0 个答案:

没有答案