本地flink执行具有令人难以置信的低性能

时间:2017-09-21 10:40:28

标签: apache-flink flink-streaming

我目前正在将我公司的一些算法移植到flink应用程序中,以便将来作为流运行。要测试这些算法,我正在使用现有数据,我正在从CSV文件中读取数据,然后使用flink-spector创建流。这些数据集通常包含大约10,000个基准,而每个数据包含时间戳和整数值。

我现在的问题是flink应用程序需要花费极长的时间(大约半小时)来处理这些数据,这些数据应该在几秒钟内很容易完成,我无法弄清楚原因。

这是我的代码的外观:

public class MyAlgorithmTest extends DataStreamTestBase {

    @Test
    public void testMyAlgorithm() {

        DataStreamSource<MyData> myDataStream = 
            createTestStream(getEventTimeInputBuilder("MyData.csv"));

        DataStream<MyData> avgDataStream = myDataStream
            .keyBy(value -> value.getUniqueId())
            .window(SlidingEventTimeWindows.of(Time.minutes(1), Time.seconds(30)))
            // aggregate data over windows of one minute
            .apply(new MyDataAggregator())
            .keyBy(value -> value.getUniqueId())
            .window(SlidingEventTimeWindows.of(Time.minutes(5), Time.seconds(150)))
            // calculate the moving average over windows of five minutes
            .apply(new MovingAvgWindowFunction<>());
        }
    }

作业已在本地成功部署(不幸的是,无法在此处发布完整输出)。这是前几秒我输出的一部分:

11:38:22,138 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - Job cd666d998a392d0907d5522babc80342 was successfully submitted to the JobManager akka://flink/deadLetters.
11:38:22,139 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Scheduling job cd666d998a392d0907d5522babc80342 (Flink Streaming Job).
11:38:22,139 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Flink Streaming Job (cd666d998a392d0907d5522babc80342) switched from state CREATED to RUNNING.
11:38:22,142 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Collection Source (1/1) (712aa1b16f98f6a44ec52c60ed920a1c) switched from CREATED to SCHEDULED.
11:38:22,146 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  Job execution switched to status RUNNING.
11:38:22,147 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  Source: Collection Source(1/1) switched to SCHEDULED 
11:38:22,155 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) switched from CREATED to SCHEDULED.
11:38:22,156 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124))(1/1) switched to SCHEDULED 
11:38:22,157 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) switched from CREATED to SCHEDULED.
11:38:22,158 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124))(1/1) switched to SCHEDULED 
11:38:22,164 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Collection Source (1/1) (712aa1b16f98f6a44ec52c60ed920a1c) switched from SCHEDULED to DEPLOYING.
11:38:22,165 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: Collection Source (1/1) (attempt #0) to 
11:38:22,178 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  Source: Collection Source(1/1) switched to DEPLOYING 
11:38:22,187 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) switched from SCHEDULED to DEPLOYING.
11:38:22,189 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124))(1/1) switched to DEPLOYING 
11:38:22,189 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (attempt #0) to 
11:38:22,265 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) switched from SCHEDULED to DEPLOYING.
11:38:22,268 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (attempt #0) to 
11:38:22,269 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124))(1/1) switched to DEPLOYING 
11:38:22,312 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Received task Source: Collection Source (1/1)
11:38:22,313 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Collection Source (1/1) (712aa1b16f98f6a44ec52c60ed920a1c) switched from CREATED to DEPLOYING.
11:38:22,314 INFO  org.apache.flink.runtime.taskmanager.Task                     - Creating FileSystem stream leak safety net for task Source: Collection Source (1/1) (712aa1b16f98f6a44ec52c60ed920a1c) [DEPLOYING]
11:38:22,321 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task Source: Collection Source (1/1) (712aa1b16f98f6a44ec52c60ed920a1c) [DEPLOYING].
11:38:22,328 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Received task TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1)
11:38:22,331 INFO  org.apache.flink.runtime.taskmanager.Task                     - TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) switched from CREATED to DEPLOYING.
11:38:22,331 INFO  org.apache.flink.runtime.taskmanager.Task                     - Creating FileSystem stream leak safety net for task TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) [DEPLOYING]
11:38:22,332 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) [DEPLOYING].
11:38:22,333 INFO  org.apache.flink.runtime.taskmanager.Task                     - Registering task at network: Source: Collection Source (1/1) (712aa1b16f98f6a44ec52c60ed920a1c) [DEPLOYING].
11:38:22,337 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Received task TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1)
11:38:22,336 INFO  org.apache.flink.runtime.taskmanager.Task                     - Registering task at network: TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) [DEPLOYING].
11:38:22,346 INFO  org.apache.flink.runtime.taskmanager.Task                     - TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) switched from CREATED to DEPLOYING.
11:38:22,349 INFO  org.apache.flink.runtime.taskmanager.Task                     - Creating FileSystem stream leak safety net for task TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) [DEPLOYING]
11:38:22,349 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) [DEPLOYING].
11:38:22,351 INFO  org.apache.flink.runtime.taskmanager.Task                     - Registering task at network: TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) [DEPLOYING].
11:38:22,364 INFO  org.apache.flink.runtime.taskmanager.Task                     - TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) switched from DEPLOYING to RUNNING.
11:38:22,366 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Collection Source (1/1) (712aa1b16f98f6a44ec52c60ed920a1c) switched from DEPLOYING to RUNNING.
11:38:22,366 INFO  org.apache.flink.runtime.taskmanager.Task                     - TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) switched from DEPLOYING to RUNNING.
11:38:22,371 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - No state backend has been configured, using default state backend (Memory / JobManager)
11:38:22,386 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) switched from DEPLOYING to RUNNING.
11:38:22,386 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124))(1/1) switched to RUNNING 
11:38:22,377 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - No state backend has been configured, using default state backend (Memory / JobManager)
11:38:22,389 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) switched from DEPLOYING to RUNNING.
11:38:22,390 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124))(1/1) switched to RUNNING 
11:38:22,395 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Collection Source (1/1) (712aa1b16f98f6a44ec52c60ed920a1c) switched from DEPLOYING to RUNNING.
11:38:22,395 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 11:38:22  Source: Collection Source(1/1) switched to RUNNING 
11:38:22,377 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - No state backend has been configured, using default state backend (Memory / JobManager)
11:38:22,458 INFO  org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     - Initializing heap keyed state backend with stream factory.
11:38:22,469 INFO  org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     - Initializing heap keyed state backend with stream factory.

此后,在我的CPU充分利用的情况下,接下来的半小时没有输出。我正在记录每次拨打MyDataAggregatorMovingAvgWindowFunction的电话,看看这些日志进入半小时后他们需要多长时间:

12:09:06,106 INFO  com.myapplication      - MyDataAggregator
12:09:06,106 INFO  com.myapplication      - MyDataAggregator
12:09:06,106 INFO  com.myapplication      - MyDataAggregator
12:09:06,106 INFO  com.myapplication      - MovingAvgWindowFunction
12:09:06,107 INFO  com.myapplication      - MovingAvgWindowFunction
12:09:06,107 INFO  com.myapplication      - MyDataAggregator
...

然后就完成了工作:

12:09:07,739 INFO  org.apache.flink.runtime.taskmanager.Task                     - TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) switched from RUNNING to FINISHED.
12:09:07,739 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0).
12:09:07,740 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) [FINISHED]
12:09:07,741 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Un-registering task and sending final execution state FINISHED to JobManager for task TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (f168bd42f08204cc049673ab2985b4f0)
12:09:07,741 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (f168bd42f08204cc049673ab2985b4f0) switched from RUNNING to FINISHED.
12:09:07,741 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 12:09:07  TriggerWindow(SlidingEventTimeWindows(300000, 150000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@21ca5b67}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124))(1/1) switched to FINISHED 
12:09:07,744 INFO  org.apache.flink.runtime.taskmanager.Task                     - TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) switched from RUNNING to FINISHED.
12:09:07,744 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d).
12:09:07,745 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) [FINISHED]
12:09:07,756 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Un-registering task and sending final execution state FINISHED to JobManager for task TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (debbe699a15106b9ef91ad4916a6ab5d)
12:09:07,759 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124)) (1/1) (debbe699a15106b9ef91ad4916a6ab5d) switched from RUNNING to FINISHED.
12:09:07,759 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Flink Streaming Job (cd666d998a392d0907d5522babc80342) switched from state RUNNING to FINISHED.
12:09:07,759 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Stopping checkpoint coordinator for job cd666d998a392d0907d5522babc80342
12:09:07,759 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 12:09:07  TriggerWindow(SlidingEventTimeWindows(60000, 30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@44ebb3d8}, EventTimeTrigger(), WindowedStream.apply(WindowedStream.java:1124))(1/1) switched to FINISHED 
12:09:07,759 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - 09/21/2017 12:09:07  Job execution switched to status FINISHED.
12:09:07,760 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore  - Shutting down
12:09:07,771 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - Terminate JobClientActor.
12:09:07,771 INFO  org.apache.flink.runtime.client.JobClient                     - Job execution complete
12:09:07,772 INFO  org.apache.flink.runtime.client.JobSubmissionClientActor      - Disconnect from JobManager Actor[akka://flink/user/jobmanager_1#-1402375298].

这真的很奇怪,因为半小时内根本没有输出。有人知道flink在那段时间可能在做什么吗?

我知道本地执行环境没有优化,但即使使用10k值,窗口,键控和简单聚合+平均计算都不应该花费那么长时间。瓶颈肯定似乎是一直充分利用的CPU。我已经给了应用程序2GB的RAM和i / o似乎没有问题,我的磁盘根本没用。

编辑:我对这些数据集玩了一点点。如果我将10k数据集减少到仅5k,则执行时间从半小时降至4分钟。这实际上是人们所期望的,因为人们会期望线性增长最大化。

1 个答案:

答案 0 :(得分:0)

我会将一个分析器附加到Flink,以了解30分钟内所有CPU周期的消耗情况。

Flink应该能够在几乎所有时间内处理这些数据。