我正在尝试使用数据集创建RDD,但无法找到映射到每个数据集行的方法。
Dataset<POJO> df1 = session.read().parquet(tableName).as(Encoders.bean(POJO.class));
使用以下方法
JavaRDD<List<Tuple3<Long, Integer, Double>>> tempDatas1 = df1.map(r -> new MapFunction<POJO, List<Tuple3<Long, Integer, Double>>>(){
//@Override
public List<Tuple3<Long, Integer, Double>> call(POJO row) throws Exception
{
// Get the sample property, remove leading and ending spaces and split it by comma
// to get each sample individually
List<Tuple2<String, Integer>> samples = zipWithIndex((row.getSamples().trim().split(",")));
// Gets the unique identifier for that s.
Long snp = row.getPos();
// Calculates the hamming distance.
return samples.stream().map(t -> {
String alleles = t._1();
Integer patient = t._2();
List<String> values = Arrays.asList(alleles.split("\\|"));
Double firstAllele = Double.parseDouble(values.get(0));
Double secondAllele = Double.parseDouble(values.get(1));
// Returns the initial S id, p id and the distance in form of Tuple.
return new Tuple3<>(snp, patient, firstAllele + secondAllele);
}).collect(Collectors.toList());
}
});
cannot resolve method map(<lambda expression>)
中的map
收到df1.map(r ->
错误。
答案 0 :(得分:0)
请使用df1.toJavaRDD()或df1.rdd(),而不是直接在数据集的顶部写入地图。最好先将数据集转换为rdd并将其映射并再次将输出存储在rdd中。 因为数据集映射不会将JavaRDD或JavaPairRDD作为转换的输出,而不将数据集首先转换为rdd。