Question

在Win-server下，按子目录组织的文件夹中有很多.csv文件。它们在每个目录中具有不同的结构/大小/编号。需要将它们全部选入数据帧，相应地命名它们，并对每个数据帧进行json化：

import glob
import pandas as pd

singlefile = [pd.read_csv(filename) for filename in glob.glob("C:\data\*.csv")]

#this will read them all into the same DataFrame
df = pd.concat(singlefile, axis=0)
...
#and finally dump it into predefined singlefile.json
df.to_json("C:\data\singlefile.json")

我如何修改它以便将它们解析为不同的数据帧，然后转储为不同的json？

选择所有目录中的文件名，直到！eof（）;
遍历要导入到DF的文件列表，并为其分配唯一的名称； -请勿覆盖同一df中的数据；
将它们分别导出到单独的jsons中

Answer 1

尝试这样的事情：

单个文件是您的熊猫数据框列表

[df.to_json("json_file_{}".format(i)) for i,df in enumerate(singlefile)]

Answer 2

根据您是需要将数据保留在内存中还是仅需要JSON文件，我建议采用以下方法：

如果您只需要JSON文件：依次执行所有操作（在将DataFrame写入JSON之后将其覆盖）

import glob
import pandas as pd

filenames = glob.glob("C:\data\*.csv")

for idx, fname in enumerate(filenames):
    df = pd.read_csv(fname)
    out_fname = os.path.splitext(os.path.basename(fname))[0]
    ...
    # and finally dump it into predefined singlefile.json
    df.to_json("C:\data\df_{}.json".format(out_fname))

如果您需要将所有DataFrame保留在内存中：使用字典

import glob
import pandas as pd


filenames = glob.glob("C:\data\*.csv")

df_dict = {}

for idx, fname in enumerate(filenames):
    df_dict[fname] = pd.read_csv(fname)
    out_fname = os.path.splitext(os.path.basename(fname))[0]
    ...
    #and finally dump it into predefined singlefile.json
    df_dict[fname].to_json("C:\data\df_{}.json".format(out_fname))

现在，您可以通过其文件名作为键来访问每个DataFrame。现在，JSON与从其导出的CSV具有相同的名称。例如。如果csv被称为“ data_foo.csv”，则JSON将被称为“ data_foo.json”。

将指定目录中的文件导入单独的熊猫数据框（并相应命名）

2 个答案: