spark中的简单wordcount程序不会溢出到磁盘并导致OOM错误。简而言之:
环境:
WL.JSONStore.get("JSONSTORENAME").advancedFind([query]).then(res => { console.log(res)}).fail(res => { console.log(res)});
代码:
Spark: 2.3.0, Scala 2.11.8
3 x Executor, each: 1 core + 512 MB RAM
Text file: 341 MB
Other configurations are default (spark.memory.fraction = 0.6)
错误:
import org.apache.spark.SparkContext
object WordCount {
def main(args: Array[String]): Unit = {
val inPath = args(0)
val sc = new SparkContext("spark://master:7077", "Word Count ver3")
val words = sc.textFile(inPath, minPartitions = 20)
.map(line => line.toLowerCase())
.flatMap(text => text.split(' '))
val wc = words.groupBy(word => word)
.map({ case (groupName, groupList) => (groupName, groupList.size) })
.count()
}
}
heapdump:
问题是:
非常感谢你的时间。