与Spark Graphframe bfs相关的java.lang.OutOfMemoryError

时间:2016-07-11 02:26:24

标签: java apache-spark out-of-memory graphframes

以这种方式调用bfs 20次以后出现OutOfMemoryError:

scala> val doot = Array("a", "b", "c")
doot: Array[String] = Array(a, b, c)

scala> val eoot = doot.updated(1, "z")
eoot: Array[String] = Array(a, z, c)

scala> doot
res0: Array[String] = Array(a, b, c)

scala> eoot
res1: Array[String] = Array(a, z, c)

从日志中,我可以看到bfs创建了大量广播变量,并尝试清除它们。我想知道广播变量的清除是否没有完全完成?我已附上以下最新的错误消息。谢谢!

list_locals = [] 
#g is the graphframe with > 3 million nodes and > 15 million edges. 

def fn(row): 
    arg1 = "id = '%s'" %row.arg1 
    arg2 = "id = '%s'" %row.arg2 
    results = g.bfs(arg1, arg2, maxPathLength = 4) 
    list_locals.append(results.rdd.collect()) 
    results = None 

# t is a list of row objects 
for i in range(101): 
    fn(t[i]) 
print i 
内存中的

(大小:8.1 KB,免费:3.0 GB)

16/07/11 09:44:28 INFO storage.BlockManagerInfo: Removed     broadcast_922_piece0 on dsg-cluster-server-s06.xxx:40047 

1 个答案:

答案 0 :(得分:1)

Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMoryError: Java heap space

这是驱动程序进程中的一个例外,你应该增加你的驱动程序内存。