Apache Flink writeAsCsv()方法编写一个对象元组

时间:2017-12-11 23:02:09

标签: java stream bigdata apache-flink

我正在关注Apache Flink教程来清理TaxiRide事件流。生成的流将打印到控制台。现在我想把它写入csv文件。

        // configure event-time processing
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        // get the taxi ride data stream
        DataStream<TaxiRide> rides = env.addSource(
                new TaxiRideSource(path, maxEventDelay, servingSpeedFactor));

        DataStream<TaxiRide> filteredRides = rides
                // filter out rides that do not start or stop in NYC
                .filter(new RideCleansing.NYCFilter());

        filteredRides.print();

我尝试过以下操作,但收到错误:java.lang.IllegalArgumentException: The writeAsCsv() method can only be used on data streams of tuples.

DataStreamSink<TaxiRide> rides = filteredRides.writeAsCsv("/resources").setParallelism(1);

当我制作DataSet<Tuple1<TaxiRide>> rides1 = filteredRides.writeAsCsv("/resources").setParallelism(1);时会导致编译错误。

如何将生成的已清理的TaxiRide对象流写入csv文件?

1 个答案:

答案 0 :(得分:1)

DataStreamDataSet属于无法混合的单独API。因此,编译错误。

错误消息“writeAsCsv()方法只能用于元组的数据流。”意味着,您必须将DataStream<TaxiRide>对象转换为DataStream元组才能将其写为CSV文件。 这可以通过简单的MapFunction

来完成
DataStream<Tuple9<Long, Boolean, DateTime, DateTime, Float, Float, Float, Float, Float, Short>> rideTuples = filteredRides
   .map(new TupleConverter());

TupleConverter定义为

class TupleConverter implements MapFunction<TaxiRide, Tuple9<Long, Boolean, DateTime, DateTime, Float, Float, Float, Float, Float, Short>> {

  public Tuple9<Long, Boolean, DateTime, DateTime, Float, Float, Float, Float, Float, Short> map(TaxiRide ride) {
     return Tuple9.of(ride.rideId, ride.isStart, ...);
  }
}

获得DataStream rideTuples后,您可以将其写入CSV文件。