Pyspark的Py4JJavaError

时间:2018-07-20 10:40:12

标签: apache-spark pyspark rdd

我创建了一个RDD,当我尝试使用该RDD时,发生以下错误。我尝试打印或collect()这个RDD,但仍然出现此错误。

print('ALS Model')
sc = SparkContext()
def parse_raw_string(raw_string):
    user_inventory = json.loads(raw_string)
    return list(user_inventory.items())[0]
def id_index(x):
    ((user_id,lst_inventory),index) = x
    return (index, user_id)
def create_tuple(x):
    ((user_id,lst_inventory),index) = x
    if lst_inventory != None:
        return (index, [(i.get('appid'), 1) for i in lst_inventory if str(i.get('appid')) in set_valid_game_id])
    else:
        return (index, [])
def reshape(x):
    (index,(appid,time)) = x
    return (index,appid,1)

user_inventory_rdd = sc.textFile(path_user_inventory).map(parse_raw_string).zipWithIndex()
dic_id_index = user_inventory_rdd.map(id_index).collectAsMap()
training_rdd = user_inventory_rdd.map(create_tuple).flatMapValues(lambda x: x).map(reshape)
model = ALS.train(training_rdd, 5)
  

Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时发生错误。   :org.apache.spark.SparkException:作业由于阶段失败而中止:阶段12.0中的任务0失败1次,最近一次失败:阶段12.0中的任务0.0丢失(TID 21,本地主机,执行程序驱动程序):org.apache.spark .api.python.PythonException:追溯(最近一次呼叫过去):

0 个答案:

没有答案