我创建了一个RDD
,当我尝试使用该RDD时,发生以下错误。我尝试打印或collect()
这个RDD
,但仍然出现此错误。
print('ALS Model')
sc = SparkContext()
def parse_raw_string(raw_string):
user_inventory = json.loads(raw_string)
return list(user_inventory.items())[0]
def id_index(x):
((user_id,lst_inventory),index) = x
return (index, user_id)
def create_tuple(x):
((user_id,lst_inventory),index) = x
if lst_inventory != None:
return (index, [(i.get('appid'), 1) for i in lst_inventory if str(i.get('appid')) in set_valid_game_id])
else:
return (index, [])
def reshape(x):
(index,(appid,time)) = x
return (index,appid,1)
user_inventory_rdd = sc.textFile(path_user_inventory).map(parse_raw_string).zipWithIndex()
dic_id_index = user_inventory_rdd.map(id_index).collectAsMap()
training_rdd = user_inventory_rdd.map(create_tuple).flatMapValues(lambda x: x).map(reshape)
model = ALS.train(training_rdd, 5)
Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时发生错误。 :org.apache.spark.SparkException:作业由于阶段失败而中止:阶段12.0中的任务0失败1次,最近一次失败:阶段12.0中的任务0.0丢失(TID 21,本地主机,执行程序驱动程序):org.apache.spark .api.python.PythonException:追溯(最近一次呼叫过去):