我有序列文件,其中key作为文本,值为自定义数据类型。
但Spark Stream无法从序列文件中读取数据。
JavaPairInputDStream<Text, CustomDataType> myRDD =
jssc.fileStream(path, Text.class, CustomDataType.class, SequenceFileInputFormat.class,
new Function<Path, Boolean>() {
@Override
public Boolean call(Path v1) throws Exception {
return Boolean.TRUE;
}
}, false);
以下是IDE的语法错误。
Bound mismatch: The generic method fileStream(String, Class<K>, Class<V>, Class<F>, Function<Path,Boolean>, boolean) of type JavaStreamingContext is not applicable for the arguments (String, Class<Text>, Class<DeltaCounter>, Class<SequenceFileInputFormat>, new Function<Path,Boolean>(){}, boolean). The inferred type SequenceFileInputFormat is not a valid substitute for the bounded parameter <F extends InputFormat<K,V>>
如何在Spark流媒体中读取序列文件?
答案 0 :(得分:0)
您需要在导入中使用正确的包。您可能正在导入旧的org.apache.hadoop.mapred
。使用此代码:
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;