如何在使用Glob读取多个CSV文件的同时将标签传递到级联数据框中以指定哪些数据来自哪个CSV文件

时间:2019-04-24 01:42:50

标签: python pandas csv

我有27个CSV文件,其中包含有关各个国家/地区GDP的数据。我正在使用Glob读取这些CSV文件,然后将它们串联到单个数据框中。现在的问题是我想指定标签,以便在级联数据框中可以标识哪个数据集处于哪个状态。

我已经尝试过将状态列表作为pd.concat()方法可用的关键参数传递,该方法执行了必需的标记,但是在我的情况下,它不起作用。

path = 'C:\folder A' # use your path
all_files = glob.glob(path + "/*.csv")

df_from_each_file = (pd.read_csv(f, encoding = "ISO-8859-1", index_col=None, header=0, sep=",") for f in all_files)

concatenated_df = pd.concat(df_from_each_file, ignore_index=True,keys=['Andhra_Pradesh','Arunachal_Pradesh','Assam','Bihar','Chhattisgarh','Goa','Gujarat','Haryana','Himachal_Pradesh','Jharkhand','Karnataka','Kerala','Madhya_Pradesh','Maharashtra','Manipur','Meghalaya','Mizoram','Nagaland','Odisha','Punjab','Rajasthan','Sikkim','Tamil_Nadu','Telangana','Tripura','Uttar_Pradesh','Uttarakhand'], sort=True)

1 个答案:

答案 0 :(得分:0)

我认为您需要先手动修改状态:

keys=['Andhra_Pradesh','Arunachal_Pradesh','Assam','Bihar','Chhattisgarh','Goa','Gujarat','Haryana','Himachal_Pradesh','Jharkhand','Karnataka','Kerala','Madhya_Pradesh','Maharashtra','Manipur','Meghalaya','Mizoram','Nagaland','Odisha','Punjab','Rajasthan','Sikkim','Tamil_Nadu','Telangana','Tripura','Uttar_Pradesh','Uttarakhand']
for df, state in zip(df_from_each_file, keys):
    df['state'] = state

concatenated_df = pd.concate(df_from_each_file, sort=True, ignore_index=True)