我的spark应用程序在两个数据帧上执行连接操作时挂起

时间:2015-10-14 13:37:53

标签: java scala apache-spark apache-spark-sql spark-dataframe

我是新来的火花,我正在加入“不等于”条件的两个数据帧。在某个时刻,我的程序停止并且从不进一步运行,也没有给出异常。

我使用的是包含100000条记录的简单文本文件。

我的程序中有一个非惰性方法'collectAsList',导致连接被执行。

这是堆栈跟踪:

15/10/14 09:25:36 INFO TaskSchedulerImpl: Adding task set 25.0 with 2 tasks
15/10/14 09:25:38 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.1.38:50065 (size: 4.7 KB, free: 5.2 GB)
15/10/14 09:25:38 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.38:50065 (size: 4.8 KB, free: 5.2 GB)
15/10/14 09:25:38 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.37:48062 (size: 4.8 KB, free: 5.2 GB)
15/10/14 09:25:38 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.1.37:48062 (size: 4.7 KB, free: 5.2 GB)
15/10/14 09:25:39 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.38:50065 (size: 13.9 KB, free: 5.2 GB)
15/10/14 09:25:39 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.37:48062 (size: 13.9 KB, free: 5.2 GB)

0 个答案:

没有答案