我在spark中有word2vec模型,我希望将其保存到文件中以便以后加载。
当我运行save方法时,我得到以下内容:
Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Serialized task 130:0 was 413893420 bytes, which exceeds max
allowed: spark.akka.frameSize (134217728 bytes) - reserved (204800
bytes). Consider increasing spark.akka.frameSize or using broadcast
variables for large values.
我做了一些研究,看起来frameSize仍然有上限,所以我想探索替代方案,以防我在未来遇到类似的问题与更大的模型。
我只是用模型创建一个广播变量,但是因为以下异常而失败了。
brd = sc.broadcast(model)
Exception: It appears that you are attempting to reference
SparkContext from a broadcast variable, action, or transformation.
SparkContext can only be used on the driver, not in code that it run
on workers. For more information, see SPARK-5063.
我想我可以运行getVectors并以某种方式导出日期,但我可以轻松加载模型。有谁知道我可以采取的任何替代方案?
由于