尝试映射到数据集行但有很多问题,我得到了eclipse错误&#34;此表达式的目标类型必须是一个功能接口&#34;在map(r -> new MapFunction<r, List<Tuple3<Long, Integer, Double>>>()
。代码如下:
Dataset<Object> df1 = session.read().parquet(tableName).as(Encoders.bean(Object.class));
JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> new MapFunction<r, List<Tuple3<Long, Integer, Double>>>(){
// to get each sample individually
List<Tuple2<String, Integer>> samples = zipWithIndex((r.getString(9).trim().split(",")));
// Gets the unique identifier for that pos.
Long snp = r.getString(1);
// Calculates the distance for this pos for each sample.
// i.e. 0|0 => 0, 0|1 => 1, 1|0 => 1, 1|1 => 2
return samples.stream().map(t -> {
String alleles = t._1();
Integer patient = t._2();
List<String> values = Arrays.asList(alleles.split("\\|"));
Double firstAllele = Double.parseDouble(values.get(0));
Double secondAllele = Double.parseDouble(values.get(1));
// Returns the initial SNP id, patient id and the distance in form of Tuple.
return new Tuple3<>(snp, patient, firstAllele + secondAllele);
}).collect(Collectors.toList());
});
任何帮助都将不胜感激。
答案 0 :(得分:0)
由于缺乏清晰度,我只是给出了我未完成的建议。
假设你指的是
org.apache.spark.api.java.function.MapFunction
。
我看到了一系列问题,
在r
MapFunction<r ..>
作为类型
匿名函数的定义不正确,要么使用lambda表达式,
```
JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> () {
// your definition
}
```
或者,将代码移动到类的方法中, ```
JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> new MapFunction<theRespectiveType, List<Tuple3<Long, Integer, Double>>>(){
public List<Tuple3<Long, Integer, Double>> call(theRespectiveType variable)
{
// your implementation here
});
```