Question

我有序列文件，其中key作为文本，值为自定义数据类型。

但Spark Stream无法从序列文件中读取数据。

JavaPairInputDStream<Text, CustomDataType> myRDD =
        jssc.fileStream(path, Text.class, CustomDataType.class, SequenceFileInputFormat.class,
            new Function<Path, Boolean>() {
          @Override
          public Boolean call(Path v1) throws Exception {
            return Boolean.TRUE;
          }
        }, false);

以下是IDE的语法错误。

Bound mismatch: The generic method fileStream(String, Class<K>, Class<V>, Class<F>, Function<Path,Boolean>, boolean) of type JavaStreamingContext is not applicable for the arguments (String, Class<Text>, Class<DeltaCounter>, Class<SequenceFileInputFormat>, new Function<Path,Boolean>(){}, boolean). The inferred type SequenceFileInputFormat is not a valid substitute for the bounded parameter <F extends InputFormat<K,V>>

如何在Spark流媒体中读取序列文件？

Answer 1

您需要在导入中使用正确的包。您可能正在导入旧的org.apache.hadoop.mapred。使用此代码：

import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;

如何使用Spark Streaming从序列文件中读取数据

1 个答案: