无法将数据集<row>映射到数据集<myrow>

时间:2017-11-09 18:02:20

标签: java apache-spark apache-spark-sql

我尝试将Dataset<Row>映射到Java中的Apache Spark中的Dataset<MyRow>。有谁知道问题出在哪里?

Dataset<Row> flightsDF = spark.read().format("csv").option("header", "true").load( ... );
Dataset<Row> DSOfRows = flightsDF
  .filter( ... );
DSOfRows.show(5);

代码将显示5个第一行,然后我将尝试将Row映射到MyRow。

Dataset<MyRow> DSOfMyRows = DSOfRows.map(
  (MapFunction<Row, MyRow>) row -> new MyRow(row.getAs("Carrier"),
      row.getAs("ArrDelay")), Encoders.bean(MyRow.class));
DSOfMyRows.show(5);

并且存在问题,因为它只会打印空行。

++
||
++
||
||
||
||
||
++
only showing top 5 rows

MyRow类看起来像这样:

public static class MyRow implements Serializable {
  public String carrier;
  public Double arrDelay;

  MyRow(String carrier, Double delay) {
      this.carrier = carrier;
      this.arrDelay = delay;
  }

  String getCarrier() {
      return carrier;
  }

  public void setCarrier(String c) {
      this.carrier = c;
  }

  Double getArrDelay() {
      return arrDelay;
  }

  public void setArrDelay(Double delay) {
      this.arrDelay = delay;
  }
}

0 个答案:

没有答案