flink:Flink Shell抛出NullPointerException

时间:2015-10-27 16:12:47

标签: apache-flink

  1. 我正在使用Flink Interactive Shell来执行WordCount。它适用于10MB的文件大小。但是使用100MB文件时,shell会抛出NullPointerException:
  2. java.lang.NullPointerException
        at org.apache.flink.api.common.accumulators.SerializedListAccumulator.deserializeList(SerializedListAccumulator.java:93)
        at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:549)
        at .<init>(<console>:22)
    
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
    at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
    at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
    at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
    at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
    at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
    at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
    at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ILoop.scala:601)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ILoop.scala:598)
    at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:104)
    at scala.reflect.io.File.applyReader(File.scala:82)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ILoop.scala:598)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(ILoop.scala:598)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(ILoop.scala:598)
    at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:130)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:597)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:597)
    at scala.tools.nsc.interpreter.ILoop.savingReader(ILoop.scala:135)
    at scala.tools.nsc.interpreter.ILoop.interpretAllFrom(ILoop.scala:596)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$loadCommand$1.apply(ILoop.scala:660)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$loadCommand$1.apply(ILoop.scala:659)
    at scala.tools.nsc.interpreter.ILoop.withFile(ILoop.scala:653)
    at scala.tools.nsc.interpreter.ILoop.loadCommand(ILoop.scala:659)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$7.apply(ILoop.scala:262)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$7.apply(ILoop.scala:262)
    at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:81)
    at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:712)
    at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
    at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
    at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
    at org.apache.flink.api.scala.FlinkShell$.startShell(FlinkShell.scala:84)
    at org.apache.flink.api.scala.FlinkShell$.main(FlinkShell.scala:54)
    at org.apache.flink.api.scala.FlinkShell.main(FlinkShell.scala)
    

    我在linux系统(16MB RAM)上工作。那可能是什么问题?

    我的代码(改编自https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/scala_api_quickstart.html):

     var filename = new String(<myFileName>)
     var text = env.readTextFile(filename)
     var counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1)   }.groupBy(0).sum(1)
     var result = counts.collect()
    
    1. 我也注意到,flink只在一个核心上执行程序。在使用env.getConfig.setParallelism(4)设置并行性并再次运行程序后,发生了另一个异常:
    2. 第1部分:

          org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed.
          at org.apache.flink.client.program.Client.run(Client.java:413)
          at org.apache.flink.client.program.Client.run(Client.java:356)
          at org.apache.flink.client.program.Client.run(Client.java:349)
          at org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:89)
          at org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:82)
          at org.apache.flink.api.java.ScalaShellRemoteEnvironment.execute(ScalaShellRemoteEnvironment.java:68)
          at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
          at org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:576)
          at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:544)
          at .<init>(<console>:28)
          at .<clinit>(<console>)
          at .<init>(<console>:7)
          at .<clinit>(<console>)
          at $print(<console>)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
          at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
          at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
          at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
          at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
          at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
          at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
          at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
          at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
          at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
          at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ILoop.scala:601)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ILoop.scala:598)
          at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:104)
          at scala.reflect.io.File.applyReader(File.scala:82)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ILoop.scala:598)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(ILoop.scala:598)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(ILoop.scala:598)
          at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:130)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:597)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:597)
          at scala.tools.nsc.interpreter.ILoop.savingReader(ILoop.scala:135)
          at scala.tools.nsc.interpreter.ILoop.interpretAllFrom(ILoop.scala:596)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$loadCommand$1.apply(ILoop.scala:660)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$loadCommand$1.apply(ILoop.scala:659)
          at scala.tools.nsc.interpreter.ILoop.withFile(ILoop.scala:653)
          at scala.tools.nsc.interpreter.ILoop.loadCommand(ILoop.scala:659)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$7.apply(ILoop.scala:262)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$7.apply(ILoop.scala:262)
          at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:81)
          at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:712)
          at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
          at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
          at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
          at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
          at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
          at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
          at org.apache.flink.api.scala.FlinkShell$.startShell(FlinkShell.scala:84)
          at org.apache.flink.api.scala.FlinkShell$.main(FlinkShell.scala:54)
          at org.apache.flink.api.scala.FlinkShell.main(FlinkShell.scala)
      

      第2部分:

      Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
          at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314)
          at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
          at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
          at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
          at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:43)
          at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
          at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
          at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
          at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
          at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
          at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
          at akka.actor.ActorCell.invoke(ActorCell.scala:487)
          at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
          at akka.dispatch.Mailbox.run(Mailbox.scala:221)
          at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
          at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
          at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
          at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
          at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at .<init>(<console>:26) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at .<init>(<console>:27)) -> Map (Map at .<init>(<console>:27)) -> Combine(SUM(1)) (2/4)) @ (unassigned) - [SCHEDULED] > with groupID < fc507fbb50fea681c726ca1d824c7577 > in sharing group < SlotSharingGroup [fc507fbb50fea681c726ca1d824c7577, fb90f780c9d5a4a9dbf983cb06bec946, 52b8abe5a21ed808f0473a599d89f046] >. Resources available to scheduler: Number of instances=1, total number of slots=1, available slots=0
          at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:250)
          at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:126)
          at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:271)
          at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:430)
          at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:307)
          at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:508)
          at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:606)
          at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:190)
          ... 18 more
      

      这是否意味着taskmanager.numberOfTaskSlots?在我的flink-conf.yaml中将此键设置为4.但是如何在shell中设置它?

1 个答案:

答案 0 :(得分:3)

你问了两个问题:

  1. 为什么print()不能用于大DataSet s?
  2. count()上使用collect()print()DataSet时,必须通过作业管理器传输已在任务管理器上分区的所有数据给客户。最好是仅使用这些方法进行测试或实现小DataSet个。对于大数据,请使用Apache Flink中提供的其中一个接收器,例如: writeAsTextFile(..)。对于每个并行任务,将创建一个输出文件。

    如果您仍想将所有数据传输到客户端,可以通过增加Akka的帧大小来实现。 Akka是Flink在引擎盖下使用的消息传递库。为此,请在akka.framesize中设置flink-conf.yaml。默认值为10485760字节(10 MB)。 akka.framesize: 100mb会将其增加到100 MB。

    对于Apache Flink 1.0,一些提交者已考虑删除此限制,并且已经有拉动请求使用其他传输方式来处理大型物化数据集。

    1. 什么是任务槽以及它们与并行性有何关系?
    2. Flink的默认配置为每个任务管理器启动一个任务槽。在本地模式下启动Scala shell时,它只启动一个任务管理器。因此,任务槽的总数是一个。将并行度更改为N时,至少需要N个任务槽才能并行执行此操作。因此,要么增加flink-conf.yaml中的任务槽数,要么启动其他任务管理器。如果你只是在本地运行,我建议只增加任务槽的数量。有关详细信息,请参阅http://flink.apache.org上的Flink文档。

      编辑:如果运行Scala-Shell,则只使用一个任务管理器启动嵌入式Flink集群。您可以使用./bin/start-local.sh启动本地群集,然后使用Scala shell的主机和端口参数(host:localhost,port:6123)连接到它。