Apache Spark:挂起广播

时间:2017-04-01 03:16:56

标签: hadoop apache-spark mapreduce apache-spark-sql spark-dataframe

我很难在Yarn上调试我的Spark 1.6.2应用程序。它以客户端模式运行。基本上它是锁定而不会崩溃,控制台中的日志在锁定时如下所示。

17/03/31 20:12:02 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh007.prod.phx3.gdg:47579 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on p3plcdsh011.prod.phx3.gdg:63228 (size: 5.4 KB, free: 511.1 MB)
    17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on p3plcdsh015.prod.phx3.gdg:9377 (size: 5.4 KB, free: 511.1 MB)
    17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on p3plcdsh015.prod.phx3.gdg:61897 (size: 5.4 KB, free: 511.1 MB)
    17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh002.prod.phx3.gdg:23170 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:03 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on p3plcdsh016.prod.phx3.gdg:16649 (size: 5.4 KB, free: 511.1 MB)
    17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh003.prod.phx3.gdg:55147 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on p3plcdsh008.prod.phx3.gdg:7619 (size: 5.4 KB, free: 511.1 MB)
    17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh003.prod.phx3.gdg:40830 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh011.prod.phx3.gdg:20056 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh008.prod.phx3.gdg:47385 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh003.prod.phx3.gdg:2063 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh011.prod.phx3.gdg:63228 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh008.prod.phx3.gdg:64036 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh016.prod.phx3.gdg:16649 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh013.prod.phx3.gdg:31979 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh013.prod.phx3.gdg:18407 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh004.prod.phx3.gdg:45536 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh008.prod.phx3.gdg:50826 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh015.prod.phx3.gdg:36247 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh015.prod.phx3.gdg:22848 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh015.prod.phx3.gdg:9377 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh015.prod.phx3.gdg:61897 (size: 26.7 KB, free: 511.1 MB)
    17/03/31 20:12:07 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on p3plcdsh008.prod.phx3.gdg:7619 (size: 26.7 KB, free: 511.1 MB)

在Spark UI中,锁定发生在地图或过滤器功能中。

有没有人看到过这种情况,或者知道如何调试这种情况?

看起来可能是由于内存问题或空间问题,但没有明确的迹象表明它是。我可以尝试将内存提升并查看它是否有帮助,但有没有人有调试提示?

谢谢

1 个答案:

答案 0 :(得分:0)

仅仅是可序列化还不够。问题可能很多:你的序列化机制(Java序列化很糟糕; Kryo好多了;等等),你的机器内存,确保你使用广播值而不是包装值等。

还有Spark配置id_cols = [col for col in list(df) if 'id' in col and col not in 'order_id'] quant_cols = [col for col in list(df) if 'quantity' in col] df2 = df[['order_id', 'customer_name']].copy() df2['quantity'] = list(df[quant_cols].values) df2['id'] = list(df[id_cols].values) df2: order_id customer_name id quantity 0 1 John [4, 24, 16] [1, 4, 1] 1 2 Paul [8, 41, 33] [3, 1, 1] 2 3 Andrew [1, 34, 8] [1, 4, 2]

" 配置在执行连接时将广播到所有工作节点的表的最大大小(以字节为单位)。通过将此值设置为-1,可以禁用广播。请注意,目前只有运行命令ANALYZE TABLE COMPUTE STATISTICS noscan的Hive Metastore表支持统计信息。"

默认为10MB序列化。

最后,如果您删除了默认限制并且内存充足,您仍然希望其大小小于最大的RDD / DataFrame,您可以使用spark.sql.autoBroadcastJoinThreshold进行检查:

SizeEstimator

最后,如果情况变得更糟,我会考虑在转换中从闪电般的缓存数据存储区进行查找而不是广播此文件。