Question

我很难保留我的json的顺序并且在pyspark中打印它。

以下是示例代码：

json_out = sqlContext.jsonRDD(sc.parallelize([json.dumps(info)]))

# here info is my ordered dictionary

json_out.toJSON().saveAsTextFile("file:///home//XXX//samplejson")

还有一件事是我想将输出作为单个文件而不是分区数据集。

在我的情况下，有没有人可以帮助进行漂亮的打印并保留输出json的顺序？

info sample：

注意：TypeA，TypeB等是一个列表，意味着TypeA或TypeB中可以有多个产品。

{
  "score": {
    "right": ,
    "wrong": 
  },
  "articles": {
    "TypeA": [{
      "ID": 333,
      "Name": "",
      "S1": "",
      "S2": "",
      "S3": "",
      "S4": ""
    }],
    "TypeB": [{
      "ID": 123,
      "Name": "",
      "T1": "",
      "T2": "",
      "T3": "",
      "T4": "",
      "T5": "",
      "T6": ""
    }]    
  }
}

（我尝试过使用json.dumps（info，indent = 2），但没有用。

如何格式化pyspark中的json输出？

0 个答案: