在pySpark中保存大型模型

时间:2016-07-06 19:02:08

标签: python apache-spark word2vec

我在spark中有word2vec模型,我希望将其保存到文件中以便以后加载。

当我运行save方法时,我得到以下内容:

Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Serialized task 130:0 was 413893420 bytes, which exceeds max 
allowed: spark.akka.frameSize (134217728 bytes) - reserved (204800 
bytes). Consider increasing spark.akka.frameSize or using broadcast 
variables for large values.

我做了一些研究,看起来frameSize仍然有上限,所以我想探索替代方案,以防我在未来遇到类似的问题与更大的模型。

我只是用模型创建一个广播变量,但是因为以下异常而失败了。

brd = sc.broadcast(model)

Exception: It appears that you are attempting to reference 
SparkContext from a broadcast variable, action, or transformation. 
SparkContext can only be used on the driver, not in code that it run 
on workers. For more information, see SPARK-5063. 

我想我可以运行getVectors并以某种方式导出日期,但我可以轻松加载模型。有谁知道我可以采取的任何替代方案?

由于

0 个答案:

没有答案