Pyspark - fpgrowth - 关联规则 - StackOverflow错误

时间:2017-10-16 09:52:55

标签: apache-spark pyspark stack-overflow pyspark-sql

我有一个庞大的数据框(500万行),每行都是一篮子项目,我正在尝试获取频繁的项目集和关联规则。但它给了我StackOverflowErrors,我尝试设置检查点目录,但它没有解决问题。不知道怎么解决这个问题?非常感谢提前

fpGrowth = FPGrowth(itemsCol="ARFeatures", minSupport=0.8, minConfidence=0.9)

model = fpGrowth.fit(completeDf)
  

java.lang.StackOverflowError的           at java.lang.reflect.InvocationTargetException。(InvocationTargetException.java:72)           at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)           at java.lang.reflect.Method.invoke(Method.java:498)           at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)           at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)           at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)           at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)           at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)           at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)           at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)           at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)           at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)           在scala.collection.mutable.HashMap $$ anonfun $ writeObject $ 1.apply(HashMap.scala:138)           在scala.collection.mutable.HashMap $$ anonfun $ writeObject $ 1.apply(HashMap.scala:136)           在scala.collection.mutable.HashTable $ class.foreachEntry(HashTable.scala:230)           在scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)           在scala.collection.mutable.HashTable $ class.serializeTo(HashTable.scala:125)           在scala.collection.mutable.HashMap.serializeTo(HashMap.scala:40)           在scala.collection.mutable.HashMap.writeObject(HashMap.scala:136)           at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)           at java.lang.reflect.Method.invoke(Method.java:498)           at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)           at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)           at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)           at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)           at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)           at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)           at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)           at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)           at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)           在scala.collection.mutable.HashMap $$ anonfun $ writeObject $ 1.apply(HashMap.scala:138)

1 个答案:

答案 0 :(得分:0)

增加驱动程序堆栈大小。这取决于您如何执行应用程序,您需要正确传递驱动程序JVM选项。

对于spark-submit,您可以将其添加为cmd行arg:

--conf "spark.driver.extraJavaOptions=-Xss10m"

请查看这些内容以获取更多详细信息: