从Elasticsearch索引加载数据时,有没有办法诊断spark作业。到目前为止,我不知道发生了什么,除了执行程序日志
15/09/18 11:15:50 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 325.4 KB, free 25.9 GB)
15/09/18 11:15:51 INFO CacheManager: Partition rdd_2_0 not found, computing it
15/09/18 11:15:51 INFO CacheManager: Partition rdd_2_4 not found, computing it
ES索引几乎是114.4GB,包含22,951,540个文档,试图将其缓存在内存中,以便我可以对数据集进行一些分析
Spark独立运行并一直持续运行,直到超时内存不足为止。
Workers: 4
Cores: 128 Total, 128 Used
Memory: 499.5 GB Total, 200.0 GB Used
Applications: 1 Running, 3 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE