我正在尝试根据生成排序字典的函数的结果在数据框中添加新列。但是我无法正常工作。
我正在使用Python 3.6,并使用本地spark会话在Pycharm上运行它。我尝试使用ArrayType,但这似乎无法解决。输出列为空
def getDictionary(data):
hour = ['8', '9', '10']
score = [data[0], data[1], data[2]]
res = dict(zip(hour, score))
sorted_x = sorted(res.items(), key=lambda kv: kv[1], reverse=1)
sorted_dict = collections.OrderedDict(sorted_x)
first3pairs = {k: sorted_dict[k] for k in list(sorted_dict)[:3]}
return first3pairs
get_res_udf = F.udf(getDictionary, ArrayType(StringType()))
data = data.withColumn('result', get_res_udf(data['probability']))
data.show(10, False)
错误:
+----------+-------------------------+------+
|loannumber|scoring_ts_utc |result|
+----------+-------------------------+------+
| 11111111|2019-08-01 19:33:18.98721|null |
+----------+-------------------------+------+
预期:
+----------+-------------------------+-----------------------------------------------------------------------------------------------------------+
|loannumber|scoring_ts_utc |result|
+----------+-------------------------+-----------------------------------------------------------------------------------------------------------+
| 11111111|2019-08-01 19:33:18.98721|{'8': 0.15553969938314824, '10': 0.1135606782079484, '12': 0.10158022312738095, '14': 0.08433517313467825} |
+----------+-------------------------+-----------------------------------------------------------------------------------------------------------+