我正在关注Apache Flink教程来清理TaxiRide事件流。生成的流将打印到控制台。现在我想把它写入csv文件。
// configure event-time processing
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// get the taxi ride data stream
DataStream<TaxiRide> rides = env.addSource(
new TaxiRideSource(path, maxEventDelay, servingSpeedFactor));
DataStream<TaxiRide> filteredRides = rides
// filter out rides that do not start or stop in NYC
.filter(new RideCleansing.NYCFilter());
filteredRides.print();
我尝试过以下操作,但收到错误:java.lang.IllegalArgumentException: The writeAsCsv() method can only be used on data streams of tuples.
DataStreamSink<TaxiRide> rides = filteredRides.writeAsCsv("/resources").setParallelism(1);
当我制作DataSet<Tuple1<TaxiRide>> rides1 = filteredRides.writeAsCsv("/resources").setParallelism(1);
时会导致编译错误。
如何将生成的已清理的TaxiRide对象流写入csv文件?
答案 0 :(得分:1)
DataStream
和DataSet
属于无法混合的单独API。因此,编译错误。
错误消息“writeAsCsv()方法只能用于元组的数据流。”意味着,您必须将DataStream<TaxiRide>
对象转换为DataStream
元组才能将其写为CSV文件。
这可以通过简单的MapFunction
:
DataStream<Tuple9<Long, Boolean, DateTime, DateTime, Float, Float, Float, Float, Float, Short>> rideTuples = filteredRides
.map(new TupleConverter());
将TupleConverter
定义为
class TupleConverter implements MapFunction<TaxiRide, Tuple9<Long, Boolean, DateTime, DateTime, Float, Float, Float, Float, Float, Short>> {
public Tuple9<Long, Boolean, DateTime, DateTime, Float, Float, Float, Float, Float, Short> map(TaxiRide ride) {
return Tuple9.of(ride.rideId, ride.isStart, ...);
}
}
获得DataStream
rideTuples
后,您可以将其写入CSV文件。