Question

我具有以下结构：

/folder_1
    file_1.json
        json_1
        json_2
    file_2.json
        json_1
        ...
/folder_2
    file_1.json
        json_1
    file_2.json
        json_1
    file_3.json
        json_1
        json_2
...

我尝试了多种方法来读取这些文件，并创建一个DataFrame，其中的每一行都是JSON中的一行。另外，JSON并非平坦，因此它们是这样的：

 {
     "a": "1",
     "b": {
        "b_1": "val",
        "b_2": []
        ...
      },
      ...
 }

我尝试了以下方法：

For loop

for dirpath, dirnames, filenames in os.walk("/path/to/folders"):
    for json_f in filenames:
        print (json_f)
        json_file_path = os.path.join(dirpath, json_f)
        training_df = json_normalize(pd.Series(open(json_file_path). 
                                     readlines()).apply(json.loads))

print (training_df.count())
training_df.to_csv('training_data.csv')

在一个场景中，我有5个文件夹，其中包含11个.json文件，而.json文件中包含19个json文件，该程序能够读取11个文件，但生成的CSV仅包含2行而不是19行。

无法在其他文件夹中处理JSON

0 个答案: