为什么当我使用rdd = rdd.map(lambda i: Recommendation.TransfertRecurrentConfigKey(i))
时我得到了这个错误
org.apache.spark.api.python.PythonException: Traceback (most recent call last):File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/worker.py", line 161, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/worker.py", line 54, in read_command
command = serializer._read_with_length(file)
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
return self.loads(obj)
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads
return pickle.loads(obj)
ImportError: No module named Recommendation.Recommendation
当我使用这个rdd = rdd.map( Recommendation.TransfertRecurrentConfigKey)
时,代码运行良好?
如果我问这是因为我希望能够通过参数列表["Sexe", "Age", "Profession", "Revenus"]
。
class Recommendation:
@staticmethod
def TransfertRecurrentRecommendation(dataFrame):
rdd = dataFrame.rdd.map(Recommendation.TransfertRecurrentCleanData)
rdd = rdd.filter(lambda x: x is not None)
#rdd = rdd.map(lambda user: ((user['Sexe'], user['Age'], user['Profession'], user['Revenus']), user['TransfertRecurrent']))
rdd = rdd.map(lambda i: Recommendation.TransfertRecurrentConfigKey(i))
print rdd.collect()
@staticmethod
def TransfertRecurrentConfigKey(user):
tmp = []
for k in ["Sexe", "Age", "Profession", "Revenus"]:
tmp.append(user[k])
return tuple(tmp), user['TransfertRecurrent']
编辑:我解决了错误,但我仍然不明白为什么在第一种情况下它起作用而在第二种情况下不起作用。 (见第二个答案ImportError: No module named numpy on spark workers)