应用错误收集

我有一个简单的Apache Spark脚本：

import os.path as op
import re

def get_file(line):
    fn = re.search(r'open\(\"(.*)\"\,', line)

    if fn is None:
        return ''

    filename = fn.group(1)
    return filename

data = sc.textFile(op.expanduser('files_folder/*'))
lineSet = data.map(lambda line: get_file(line))
print lineSet.count()
distinctLines = lineSet.distinct()
print distinctLines.count()

它应该做的就是在所有文件中返回一组不同的行。不幸的是，它失败了这个错误：

14/11/26 16:50:15 ERROR Executor: Exception in task 102.0 in stage 2.0 (TID 257)
java.lang.OutOfMemoryError: unable to create new native thread

有人知道这是什么问题吗？谷歌搜索此错误返回了rather unhelpful email exchange关于线程限制。

Apache spark在不同的调用上失败

0 个答案: