通过下面的df3列对df2列进行求和的最佳方法是什么?
df = pd.DataFrame(np.random.rand(25).reshape((5,5)),index = ['A','B','C','D','E'])
df1 = pd.DataFrame(np.random.rand(15).reshape((5,3)),index = ['A','B','C','D','E'])
df2 = pd.concat([df,df1],axis=1)
df3 = pd.DataFrame(np.random.rand(25).reshape((5,5)),columns = np.arange(5),index = ['A','B','C','D','E'])
答案是df3的形状。
为清晰起见编辑:
df = pd.DataFrame(np.ones(25).reshape((5,5)),index = ['A','B','C','D','E'])
df1 = pd.DataFrame(np.ones(15).reshape((5,3))*2,index = ['A','B','C','D','E'],columns = [1,3,4])
df2 = pd.concat([df,df1],axis=1)
df3 = pd.DataFrame(np.empty((5,5)),columns = np.arange(5),index = ['A','B','C','D','E'])
print(df2)
0 1 2 3 4 1 3 4
A 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
B 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
C 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
D 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
E 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0
期望的结果是:
0 1 2 3 4
A 1.0 3.0 1.0 3.0 3.0
B 1.0 3.0 1.0 3.0 3.0
C 1.0 3.0 1.0 3.0 3.0
D 1.0 3.0 1.0 3.0 3.0
E 1.0 3.0 1.0 3.0 3.0
答案 0 :(得分:6)
您可以按列对DF进行分组:
In [57]: df2.groupby(axis=1, by=df2.columns).sum()
Out[57]:
0 1 2 3 4
A 1.0 3.0 1.0 3.0 3.0
B 1.0 3.0 1.0 3.0 3.0
C 1.0 3.0 1.0 3.0 3.0
D 1.0 3.0 1.0 3.0 3.0
E 1.0 3.0 1.0 3.0 3.0
您可以明确指定轴名称:
In [58]: df2.groupby(axis='columns', by=df2.columns).sum()
Out[58]:
0 1 2 3 4
A 1.0 3.0 1.0 3.0 3.0
B 1.0 3.0 1.0 3.0 3.0
C 1.0 3.0 1.0 3.0 3.0
D 1.0 3.0 1.0 3.0 3.0
E 1.0 3.0 1.0 3.0 3.0
或a short version from @piRSquared
df2.groupby(df2.columns, 1).sum()
答案 1 :(得分:2)
让我们使用T
转置,groupby
和sum
:
df2.T.groupby(level=0).sum().T
原创df2:
0 1 2 3 4 0 1 \
A 0.627278 0.008150 0.285077 0.931831 0.683035 0.691318 0.873139
B 0.246861 0.108021 0.903743 0.030373 0.870753 0.143835 0.251623
C 0.367309 0.551530 0.193623 0.704314 0.136061 0.102401 0.287334
D 0.580771 0.592600 0.949666 0.806875 0.288331 0.794173 0.034380
E 0.088984 0.838401 0.988919 0.636134 0.353484 0.584571 0.090235
2
A 0.763687
B 0.735570
C 0.405304
D 0.446789
E 0.542930
new_df2 = df2.T.groupby(level=0).sum().T
print(new_df2)
输出新的df2:
0 1 2 3 4
A 1.318595 0.881289 1.048764 0.931831 0.683035
B 0.390697 0.359644 1.639314 0.030373 0.870753
C 0.469710 0.838864 0.598927 0.704314 0.136061
D 1.374944 0.626980 1.396455 0.806875 0.288331
E 0.673555 0.928636 1.531849 0.636134 0.353484
答案 2 :(得分:1)
解决方案1
numpy.dot
+ pandas.get_dummies
cols = df2.columns.values
pd.DataFrame(
df2.values.dot(pd.get_dummies(cols).values),
df2.index, pd.unique(df2.columns.values)
)
0 1 2 3 4
A 1 3 1 3 3
B 1 3 1 3 3
C 1 3 1 3 3
D 1 3 1 3 3
E 1 3 1 3 3
解决方案2
numpy.einsum
+ pandas.get_dummies
cols = df2.columns.values
pd.DataFrame(
np.einsum('ij,jk->ik', df2.values, pd.get_dummies(cols).values),
df2.index, pd.unique(df2.columns.values)
)
0 1 2 3 4
A 1 3 1 3 3
B 1 3 1 3 3
C 1 3 1 3 3
D 1 3 1 3 3
E 1 3 1 3 3
天真的时间
设置
df2 = pd.DataFrame(
[[1, 1, 1, 1, 1, 2, 2, 2]],
list('ABCDE'),
[0, 1, 2, 3, 4, 1, 3, 4]
)
答案 3 :(得分:0)
这就是你的意思:
new_df = pd.DataFrame()
for c in df3.columns:
try:
new_df[c] = [sum(x) for x in df2[c].values]
except:
new_df[c] = df2[c].values