Question

我在spark中有word2vec模型，我希望将其保存到文件中以便以后加载。

当我运行save方法时，我得到以下内容：

Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Serialized task 130:0 was 413893420 bytes, which exceeds max 
allowed: spark.akka.frameSize (134217728 bytes) - reserved (204800 
bytes). Consider increasing spark.akka.frameSize or using broadcast 
variables for large values.

我做了一些研究，看起来frameSize仍然有上限，所以我想探索替代方案，以防我在未来遇到类似的问题与更大的模型。

我只是用模型创建一个广播变量，但是因为以下异常而失败了。

brd = sc.broadcast(model)

Exception: It appears that you are attempting to reference 
SparkContext from a broadcast variable, action, or transformation. 
SparkContext can only be used on the driver, not in code that it run 
on workers. For more information, see SPARK-5063.

我想我可以运行getVectors并以某种方式导出日期，但我可以轻松加载模型。有谁知道我可以采取的任何替代方案？

由于

在pySpark中保存大型模型

0 个答案: