Spark数据集映射行无法解析方法映射(<lambda expression =“”>)&#39;

时间:2018-04-26 15:37:35

标签: java apache-spark apache-spark-sql parquet apache-spark-dataset

尝试通过尝试映射到数据集中的每一行来尝试从数据集创建RDD。在cannot resolve method map(<lambda expression>)中的map上收到df1.map(r ->错误:

Dataset<Object> df1 = session.read().parquet(tableName).as(Encoders.bean(Object.class));

JavaRDD<List<Tuple3<Long, Integer, Double>>> tempDatas1 = df1.map(r -> new MapFunction<Object, List<Tuple3<Long, Integer, Double>>>(){
            //@Override
            public List<Tuple3<Long, Integer, Double>> call(Object row) throws Exception
            {

            // Get the sample property, remove leading and ending spaces and split it by comma
            // to get each sample individually
            List<Tuple2<String, Integer>> samples = zipWithIndex((row.getSamples().trim().split(",")));

            // Gets the unique identifier for that s.
            Long snp = row.getPos();

            // Calculates the hamming distance.
            return samples.stream().map(t -> {
                String alleles = t._1();
                Integer patient = t._2();

                List<String> values = Arrays.asList(alleles.split("\\|"));

                Double firstAllele = Double.parseDouble(values.get(0));
                Double secondAllele = Double.parseDouble(values.get(1));

                // Returns the initial s id, p id and the distance in form of Tuple.
                return new Tuple3<>(snp, patient, firstAllele + secondAllele);
            }).collect(Collectors.toList());
            }
        });

任何帮助都将不胜感激。

0 个答案:

没有答案