如何附加保留列顺序的n csv文件?
每个文件中的列标题可能不同,因此某些列可能会丢失,或者某些新列可能会出现在连续文件中。
例如
File1
Column2 Column1 Column4
1 1 Text1
1 1 Text1
File2
Column2 Column1 Column3 Column4 Column5
2 2 2 Text2 xxx
2 2 2 Text2 xxx
File3
Column2 Column1 Column3 Column4
3 3 3 Text3
3 3 3 Text3
所需输出
Column2 Column1 Column4 Column3 Column5
1 1 Text1
1 1 Text1
2 2 Text2 2 xxx
2 2 Text2 2 xxx
3 3 Text3 3
3 3 Text3 3
我正在尝试使用pandas,但最终输出中的列按字母顺序排序。 有没有办法避免按字母排序/控制列顺序?
import pandas as pd
import glob
files = glob.glob("C:\CSVs\*.csv")
df_list = []
for filename in sorted(files):
df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list, ignore_index= True)
full_df.to_csv('output3.csv')
我正在
Column1 Column2 Column3 Column4 Column5
...
答案 0 :(得分:1)
尝试这种方式,它将根据文件的标题动态工作:
import pandas as pd
import glob
files = glob.glob("./CSVs/*.csv")
df_list = []
col_list=[]
for filename in sorted(files):
df = pd.read_csv(filename)
df_list.append(df)
for e in list(df.columns):
if e not in new_col_list:
col_list.append(e)
full_df = pd.concat(df_list)
full_df = full_df[col_list]
full_df
输出:
Column2 Column1 Column4 Column3 Column5
0 1 1 Text1 NaN NaN
1 1 1 Text1 NaN NaN
0 2 2 Text2 2 xxx
1 2 2 Text2 2 xxx
0 3 3 Text3 3 NaN
1 3 3 Text3 3 NaN
答案 1 :(得分:0)
import pandas as pd
import glob
import itertools
files = glob.glob("C:\CSVs\*.csv")
df_list = []
new_col_list = []
for filename in sorted(files):
x = pd.read_csv(filename)
df_list.append(x)
new_col_list.append(x.columns)
full_df = pd.concat(df_list, ignore_index= True)
new_col_list_merged = list(itertools.chain.from_iterable(new_col_list))
full_df_updated = full_df[new_col_list_merged ]
full_df.to_csv('output3.csv')
希望这应该有用
答案 2 :(得分:0)
如果您只想设置最终的列顺序,可以尝试:
full_df = full_df.reindex_axis(['Column2', 'Column1', 'Column4', 'Column3', 'Column5'], axis=1)
full_df.to_csv('output3.csv')
答案 3 :(得分:0)
>>> df = file1.append(file2).append(file3)
>>> df.reset_index(inplace=True)
>>> df.reindex(columns=['Column2', 'Column1', 'Column4', 'Column3', 'Column5'])
Column2 Column1 Column4 Column3 Column5
0 1 1 Text1 NaN NaN
1 1 1 Text1 NaN NaN
2 2 2 Text2 2.0 xxx
3 2 2 Text2 2.0 xxx
4 3 3 Text3 3.0 NaN
5 3 3 Text3 3.0 NaN