根据条件迭代求和的列

时间:2018-07-05 20:54:41

标签: python python-3.x pandas

给出一个数据框:

import pandas as pd

df = pd.DataFrame({'A': [None, 1, 1, 2, 1, None, 2],
                   'B': [1, None, None, 1, 5, None, 3],
                   'C': [2, 4, 1, None, 5, None, 2],
                   'D': [3, None, 1, None, 5, None, 1],
                   'E': [None, 1, None, None, None, None, 7]})

    A   B   C   D   E
0   NaN 1.0 2.0 3.0 NaN
1   1.0 NaN 4.0 NaN 1.0
2   1.0 NaN 1.0 1.0 NaN
3   2.0 1.0 NaN NaN NaN
4   1.0 5.0 5.0 5.0 NaN
5   NaN NaN NaN NaN NaN
6   2.0 3.0 2.0 1.0 7.0

我想仅基于每个连续列的非空行来对列进行总计。可以这样完成:

ls = []
names = []
for column in df.columns:
    names += [column]
    ls += [df.loc[df[column] > 0, :].sum()]

pd.concat(ls, keys=names, axis = 1)

哪个结果:

     A    B    C    D    E
A   7.0  5.0  5.0  4.0  3.0
B   9.0  10.0 9.0  9.0  3.0
C   12.0 9.0  14.0 10.0 6.0
D   7.0  9.0  10.0 10.0 1.0
E   8.0  7.0  8.0  7.0  8.0

但是,我敢肯定,还有更好的pythonic方法。有什么建议吗?

1 个答案:

答案 0 :(得分:1)

NaN替换为0,转置df,并将其乘以“ 1”的掩码矩阵,其中df不为空:

mask = df.notnull().astype(int)
df.fillna(0).T.dot(mask)
      A     B     C     D    E
A   7.0   5.0   5.0   4.0  3.0
B   9.0  10.0   9.0   9.0  3.0
C  12.0   9.0  14.0  10.0  6.0
D   7.0   9.0  10.0  10.0  1.0
E   8.0   7.0   8.0   7.0  8.0