python - 如何附加n csv文件保留列顺序

时间:2017-05-28 02:52:43

标签: python csv pandas

如何附加保留列顺序的n csv文件?

每个文件中的列标题可能不同,因此某些列可能会丢失,或者某些新列可能会出现在连续文件中。

例如

File1
Column2 Column1 Column4
1       1       Text1
1       1       Text1

File2
Column2 Column1 Column3 Column4 Column5
2         2        2    Text2   xxx
2         2        2    Text2   xxx

File3
Column2 Column1 Column3 Column4
3              3    3   Text3
3              3    3   Text3

所需输出

Column2 Column1 Column4 Column3 Column5
1          1    Text1
1          1    Text1
2          2    Text2       2     xxx
2          2    Text2       2     xxx
3          3    Text3       3
3          3    Text3       3

我正在尝试使用pandas,但最终输出中的列按字母顺序排序。 有没有办法避免按字母排序/控制列顺序?

import pandas as pd
import glob

files = glob.glob("C:\CSVs\*.csv")
df_list = []

for filename in sorted(files):
    df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list, ignore_index= True)

full_df.to_csv('output3.csv')

我正在

Column1 Column2 Column3 Column4 Column5
...

4 个答案:

答案 0 :(得分:1)

尝试这种方式,它将根据文件的标题动态工作:

import pandas as pd
import glob

files = glob.glob("./CSVs/*.csv")
df_list = []
col_list=[]
for filename in sorted(files):
    df = pd.read_csv(filename)
    df_list.append(df)
    for e in list(df.columns):
        if e not in new_col_list:
            col_list.append(e)
full_df = pd.concat(df_list)
full_df = full_df[col_list]
full_df

输出:

    Column2 Column1 Column4 Column3 Column5
0   1       1       Text1   NaN     NaN
1   1       1       Text1   NaN     NaN
0   2       2       Text2   2       xxx
1   2       2       Text2   2       xxx
0   3       3       Text3   3       NaN
1   3       3       Text3   3       NaN

答案 1 :(得分:0)

import pandas as pd
import glob
import itertools

files = glob.glob("C:\CSVs\*.csv")
df_list = []
new_col_list = []
for filename in sorted(files):
    x = pd.read_csv(filename)
    df_list.append(x)
    new_col_list.append(x.columns)
full_df = pd.concat(df_list, ignore_index= True)
new_col_list_merged = list(itertools.chain.from_iterable(new_col_list))
full_df_updated = full_df[new_col_list_merged ]
full_df.to_csv('output3.csv')

希望这应该有用

答案 2 :(得分:0)

如果您只想设置最终的列顺序,可以尝试:

full_df = full_df.reindex_axis(['Column2', 'Column1', 'Column4', 'Column3', 'Column5'], axis=1)
full_df.to_csv('output3.csv')

答案 3 :(得分:0)

>>> df = file1.append(file2).append(file3)
>>> df.reset_index(inplace=True)

>>> df.reindex(columns=['Column2', 'Column1', 'Column4', 'Column3', 'Column5'])
   Column2  Column1 Column4  Column3 Column5
0        1        1   Text1      NaN     NaN
1        1        1   Text1      NaN     NaN
2        2        2   Text2      2.0     xxx
3        2        2   Text2      2.0     xxx
4        3        3   Text3      3.0     NaN
5        3        3   Text3      3.0     NaN