在写PySpark DF为Json时需要Nan键:值

时间:2019-02-26 13:20:03

标签: json apache-spark dataframe pyspark

我正在尝试以JSON格式写出PySpark DataFrame(DF)。 DF有一些带有NAN值的行。我正在使用以下方法写出DF

DF.coalesce(1).write.format('json').mode('overwrite').save('myDest/' + ext) 

输出JSON会忽略没有值的键。

这是一个示例:

{"id":"890226","dt":"2018-01 14T17:05:00.000Z","key":2.9427571,"anotherkey":3}
{"id":"890226","dt":"2018-01-14T17:10:00.000Z","key":2.9815376,"anotherkey":3}
{"id":"890226","dt":"2018-01-14T17:15:00.000Z","key":2.94226,"anotherkey":3}
{"id":"890226","dt":"2018-01-14T17:20:00.000Z","anotherkey":1}
{"id":"890226","dt":"2018-01-14T17:25:00.000Z","anotherkey":1}
{"id":"890226","dt":"2018-01-14T17:30:00.000Z","anotherkey":1}
{"id":"890226","dt":"2018-01-14T17:35:00.000Z","anotherkey":1} 

如最后4条所示,生成的JSON跳过了'key'属性,因为在DF中,其值为NAN

在Panadas数据框中,有一个选项可以将NAN保留为key = None

有没有办法在PySpark DF中保存Nan?

0 个答案:

没有答案