Spark SQL RowFactory返回空行

时间:2018-01-12 13:16:56

标签: java apache-spark apache-spark-sql

我有一个这样的架构的数据集:

{"user":"A10T7BS07XCWQ1","recommendations":[{"iID":34142,"rating":22.998692},{"iID":24963,"rating":22.678337},{"iID":47761,"rating":22.31455},{"iID":28694,"rating":21.269365},{"iID":36890,"rating":21.143366},{"iID":48522,"rating":20.678747},{"iID":20032,"rating":20.330639},{"iID":57315,"rating":20.099955},{"iID":18148,"rating":20.07064},{"iID":7321,"rating":19.754635}]}

我尝试通过这样的方式flatMap我的数据集:

    StructType struc = new StructType();
    struc.add("user", DataTypes.StringType, false);
    struc.add("item", DataTypes.IntegerType, false);
    struc.add("relevance", DataTypes.DoubleType, false);
    ExpressionEncoder<Row> encoder = RowEncoder.apply(struc);

    Dataset<Row> recomenderResult = userRecs.flatMap((FlatMapFunction<Row, Row>) row -> {
        String user = row.getString(0);
        List<Row> recsWithIntItemID = row.getList(1);
        Integer item;
        Double relevance;
        List<Row> rows = new ArrayList<>();

        for (Row rec : recsWithIntItemID) {

            item = rec.getInt(0);
            relevance = (double) rec.getFloat(1);
            System.out.println(user + " : " + item + " : " + relevance);

            Row newRow = RowFactory.create(user, item, relevance);
            rows.add(newRow);
        }
        System.out.println("++++++++++++++++++++++++++++++++");
        return rows.iterator();
    }, encoder);

    recomenderResult.write().json("temp2");
    recomenderResult.show();

系统输出如下:

...

A1049B0RS95K7B : 24708 : 17.146669387817383
A1049B0RS95K7B : 2825 : 16.809375762939453
A1049B0RS95K7B : 36503 : 16.758258819580078
++++++++++++++++++++++++++++++++

...

但是Row实例为空,show()方法给出了这样的输出:

++
||
++
||
||

我不知道为什么我的结果数据集为空。我已经看过这个网站上与我的问题相关的所有主题并使用谷歌,但我还没有找到我的问题的解决方案。有人能帮助我吗?

1 个答案:

答案 0 :(得分:2)

这是一个非常愚蠢的错误:(简单回答,错误在这里:

    StructType struc = new StructType();
    struc = struc.add("user", DataTypes.StringType, false);
    struc = struc.add("item", DataTypes.IntegerType, false);
    struc = struc.add("relevance", DataTypes.DoubleType, false);
    ExpressionEncoder<Row> encoder = RowEncoder.apply(struc);

我花了2天一夜......