我尝试使用ExecutionConfig.setMaxParallelism()
方法为Flink作业设置最大并行度,但是似乎没有用。
我还修改了标准的WordCount示例以运行一些测试,并且看来setMaxParallelism()
方法对本地环境或独立群集没有任何影响。
setMaxParallelism()
如何工作?
答案 0 :(得分:0)
Flink提供两种设置:
setParallelism(x)
将作业或操作员的并行度设置为x
,即操作员的并行任务数。 setMaxParallelism(y)
控制可将键控状态分配到的最大任务数,即操作员的最大有效并行度。操作员仍然可以有更多任务,但是只有y
个任务会分配有键状态,并且可以用于处理。分配密钥状态的单位称为密钥组。 documentation更详细地解释了这些概念。
答案 1 :(得分:0)
我今天使用流而不是数据集进行了更多测试。这次,我看到了setMaxParallelism()的效果。
public static void main(String[] args) throws Exception
{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().setMaxParallelism(4); // <-- effect
DataStream<String> text = env.fromElements(WORDS);
DataStream<Tuple2<String, Integer>> counts = text.flatMap(new Tokenizer()).keyBy(0).sum(1);
counts.writeAsCsv("test.dat");
env.execute("WordCount Example");
}
客户看到的有趣错误
Caused by: org.apache.flink.runtime.JobException: Vertex Flat Map's parallelism (8) is higher than the max parallelism (4). Please lower the parallelism or increase the max parallelism.
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:188)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:830)
at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:232)
at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:100)
at org.apache.flink.runtime.jobmaster.JobMaster.createExecutionGraph(JobMaster.java:1152)
at org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1132)
at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:294)
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:157)
... 10 more
谢谢