我在运行有状态计算时遇到警告。状态由BloomFilter(stream-lib)组成,Value和Integer为键。
程序运行平稳几分钟,之后,我收到此警告,流媒体应用程序变得不稳定(处理时间呈指数级增长),最终作业失败。
WARN TaskSetManager: Lost task 0.0 in stage 144.0 (TID 326, mesos-slave-02): scala.NotImplementedError: put() should not be called on an EmptyStateMap
at org.apache.spark.streaming.util.EmptyStateMap.put(StateMap.scala:73)
at org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:62)
at org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
at org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:155)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我正在使用kryo序列化。从互联网的某个地方,我得到提示,这可能是由于OpenHashMapBasedStateMap的kryo序列化错误。但是,我不知道如何解决这个问题。
环境:Spark集群以独立模式运行,有1个主站,5个从站,每个都有4个vCPUS,8GB RAM 数据从3节点kafka集群流式传输(由3节点zk集群管理)。
检查点正在hadoop-cluster进行, 另外我们还在HBase中保存状态(在hadoop-cluster之上)并在启动流应用程序时恢复它
此问题最初是在this spark mailing list post中提出的,但在此帖发布之前我还没有得到任何答案。