给出一个数据框:
import pandas as pd
df = pd.DataFrame({'A': [None, 1, 1, 2, 1, None, 2],
'B': [1, None, None, 1, 5, None, 3],
'C': [2, 4, 1, None, 5, None, 2],
'D': [3, None, 1, None, 5, None, 1],
'E': [None, 1, None, None, None, None, 7]})
A B C D E
0 NaN 1.0 2.0 3.0 NaN
1 1.0 NaN 4.0 NaN 1.0
2 1.0 NaN 1.0 1.0 NaN
3 2.0 1.0 NaN NaN NaN
4 1.0 5.0 5.0 5.0 NaN
5 NaN NaN NaN NaN NaN
6 2.0 3.0 2.0 1.0 7.0
我想仅基于每个连续列的非空行来对列进行总计。可以这样完成:
ls = []
names = []
for column in df.columns:
names += [column]
ls += [df.loc[df[column] > 0, :].sum()]
pd.concat(ls, keys=names, axis = 1)
哪个结果:
A B C D E
A 7.0 5.0 5.0 4.0 3.0
B 9.0 10.0 9.0 9.0 3.0
C 12.0 9.0 14.0 10.0 6.0
D 7.0 9.0 10.0 10.0 1.0
E 8.0 7.0 8.0 7.0 8.0
但是,我敢肯定,还有更好的pythonic
方法。有什么建议吗?
答案 0 :(得分:1)
将NaN
替换为0,转置df
,并将其乘以“ 1”的掩码矩阵,其中df
不为空:
mask = df.notnull().astype(int)
df.fillna(0).T.dot(mask)
A B C D E
A 7.0 5.0 5.0 4.0 3.0
B 9.0 10.0 9.0 9.0 3.0
C 12.0 9.0 14.0 10.0 6.0
D 7.0 9.0 10.0 10.0 1.0
E 8.0 7.0 8.0 7.0 8.0