Question

尝试映射到数据集行但有很多问题，我得到了eclipse错误＆＃34;此表达式的目标类型必须是一个功能接口＆＃34;在map(r -> new MapFunction<r, List<Tuple3<Long, Integer, Double>>>()。代码如下：

Dataset<Object> df1 = session.read().parquet(tableName).as(Encoders.bean(Object.class));

        JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> new MapFunction<r, List<Tuple3<Long, Integer, Double>>>(){


            // to get each sample individually
            List<Tuple2<String, Integer>> samples = zipWithIndex((r.getString(9).trim().split(",")));

            // Gets the unique identifier for that pos.
            Long snp = r.getString(1);

            // Calculates the distance for this pos for each sample.
            // i.e. 0|0 => 0, 0|1 => 1, 1|0 => 1, 1|1 => 2
            return samples.stream().map(t -> {
                String alleles = t._1();
                Integer patient = t._2();

                List<String> values = Arrays.asList(alleles.split("\\|"));

                Double firstAllele = Double.parseDouble(values.get(0));
                Double secondAllele = Double.parseDouble(values.get(1));

                // Returns the initial SNP id, patient id and the distance in form of Tuple.
                return new Tuple3<>(snp, patient, firstAllele + secondAllele);
            }).collect(Collectors.toList());
        });

任何帮助都将不胜感激。

Answer 1

由于缺乏清晰度，我只是给出了我未完成的建议。

假设你指的是 org.apache.spark.api.java.function.MapFunction。

我看到了一系列问题，

在r

MapFunction<r ..>

匿名函数的定义不正确，要么使用lambda表达式，

```

JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> () {
// your definition
}

```

或者，将代码移动到类的方法中， ```

JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> new MapFunction<theRespectiveType, List<Tuple3<Long, Integer, Double>>>(){
public List<Tuple3<Long, Integer, Double>> call(theRespectiveType variable)
{
// your implementation here
});

```

映射数据集行获取＆＃34;此表达式的目标类型必须是功能接口＆＃34;

1 个答案: