我有一个关于合并4个数据框的查询 例如,我有4个数据框,如下所示:
print(df1)
SET I Violations
Rule 1 1
Rule 2 1
Rule 3 6
print(df2)
SET II Violations
Rule 1 2
Rule 2 3
Rule 3 6
print(df3)
SET III Violations
Rule 1 2
Rule 2 4
Rule 3 8
print(df4)
SET IV Violations
Rule 1 2
Rule 2 5
Rule 3 8
我的预期输出:
SET I Violations
Rule 1 1
Rule 2 1
Rule 3 6
SET II Violations
Rule 1 2
Rule 2 3
Rule 3 6
SET III Violations
Rule 1 2
Rule 2 4
Rule 3 8
SET IV Violations
Rule 1 2
Rule 2 5
Rule 3 8
我现在得到的输出:
a)
SET I SET II SET III SET IV Violations
Rule 1 1
Rule 2 1
Rule 3 6
Rule 1 2
Rule 2 3
Rule 3 6
Rule 1 2
Rule 2 4
Rule 3 8
Rule 1 2
Rule 2 5
Rule 3 8
command_used:
pandas.concat([df1,df2,df3,df4],axis=0,ignore_index=True)
b)
Rule 1 1 Rule 1 2 Rule 1 2 Rule 1 2
Rule 2 1 Rule 2 3 Rule 2 4 Rule 2 5
Rule 3 6 Rule 3 6 Rule 3 8 Rule 3 8
command_used:
pandas.concat([df1,df2,df3,df4],axis=1,ignore_index=True)
请帮助我以上
答案 0 :(得分:3)
如果您需要列标题作为最后一个DataFrame
中的行值以及各组之间的空白行,则可以尝试transposing,reset_index
,然后移回然后移至{{3 }},然后再使用DataFrame.append
:
s = pd.Series([''] * df1.shape[1], name='blank')
df_new = pd.concat([df.T.reset_index().T.append(s) for df in [df1, df2, df3, df4]], ignore_index=True)
df_new.columns = ['SET', 'VIOLATIONS']
print(df_new)
SET VIOLATIONS
0 SET I Violations
1 Rule 1 1
2 Rule 2 1
3 Rule 3 6
4
5 SET II Violations
6 Rule 1 2
7 Rule 2 3
8 Rule 3 6
9
10 SET III Violations
11 Rule 1 2
12 Rule 2 4
13 Rule 3 8
14
15 SET IV Violations
16 Rule 1 2
17 Rule 2 5
18 Rule 3 8
19
否则使用:
df_new = pd.concat([df.set_axis(range(len(s)), axis=1, inplace=False).append(s)
for df in [df1, df2, df3, df4]], ignore_index=True)
df_new.columns = ['SET', 'VIOLATIONS']
print(df_new)
SET VIOLATIONS
0 Rule 1 1
1 Rule 2 1
2 Rule 3 6
3
4 Rule 1 2
5 Rule 2 3
6 Rule 3 6
7
8 Rule 1 2
9 Rule 2 4
10 Rule 3 8
11
12 Rule 1 2
13 Rule 2 5
14 Rule 3 8
15
要输出到没有索引的excel工作簿,请使用pandas.concat
:
df_new.to_excel('./workbookname.xls', index=False)
SET VIOLATIONS
0 1
Rule 1 1
Rule 2 1
Rule 3 6
SET II Violations
Rule 1 2
Rule 2 3
Rule 3 6
SET III Violations
Rule 1 2
Rule 2 4
Rule 3 8
SET IV Violations
Rule 1 2
Rule 2 5
Rule 3 8
要添加sum
个违规,列表理解不再是可行的选择。这次,我们将使用for
循环,将结果附加到列表中,并在末尾concat
进行>
df_new = []
for df in [df1, df2, df3, df4]:
total = pd.Series(['Total', df.iloc[:, 1].sum()], name='sum')
blank = pd.Series(['', ''], name='blank')
df_new.append(df.T.reset_index().T.append(total).append(blank))
df_new = pd.concat(df_new, ignore_index=True).rename(columns={0:'SET', 1:'VIOLATION'})
df_new.to_excel('./workbookname.xls', index=False)
答案 1 :(得分:1)
获得此输出的原因是每个SET
列的列名都不同。在使用pd.concat
之前,您必须协调这些名称。另外,在pd.concat
中,您使用了错误的axis
。
dfs = [df1, df2, df3, df4]
for d in dfs:
d.columns = [col[:3] if 'SET' in col else col for col in d.columns]
df_all = pd.concat(dfs, ignore_index=True)
哪种产量:
print(df_all)
SET Violations
0 Rule 1 1
1 Rule 2 1
2 Rule 3 6
3 Rule 1 2
4 Rule 2 3
5 Rule 3 6
6 Rule 1 2
7 Rule 2 4
8 Rule 3 8
9 Rule 1 2
10 Rule 2 5
11 Rule 3 8