应用错误收集

我有一个文本文件，其中包含键和值，但缺少一些值，

key1 12 13 na
key1 na 11 11
key1 12 13 11
key2 11 12 10
key3 10 11 10
key3 na na na

我想填写缺失值，所以我做了（数据是我的rdd）

def fill_na(x):
    ldf = Df(list(x))
    df_with_na = ldf #df_with_na.toPandas()
    df_with_mode = df_with_na.fillna(df_with_na.mode().iloc[0])
    return_list = df_with_mode.values.tolist()
    return return_list

data1 = data.mapValues(fill_na).flatMapValues(f)

现在data1看起来像：

data1.collect() 

(key1 ,[12 13 11])
(key1 ,[12 11 11])
(key1 ,[12 13 11])
(key2 ,[11 12 10])
(key3 ,[10 11 10])
(key3 ,[10 11 10])

现在我希望上面的data1写入我试过的dataframe / table

data1.toDF().toPandas()

但是我收到了错误

TypeError: StringType can not accept object in type <type 'float'>

1）如何写入数据帧？ 2）如何将键和列表转换为如下所示的单个元组？

(key1 ,11,12,13)

这样我可以直接写入数据帧吗？

提前感谢：）

如何将键和值列表转换为pyspark中的数据帧？

0 个答案: