Pandas:在每个时间戳中找到每列的非NaN记录的累计总和

时间:2018-05-20 00:46:03

标签: python pandas cumsum

我有以下数据框:

              timestamp   col_A     col_B    col_C
0   2016-02-15 00:00:00     2.0     NaN        NaN  
1   2016-02-15 00:01:00     1.0     NaN        NaN
2   2016-02-15 00:02:00     4.0     2.0        NaN  
3   2016-02-15 00:03:00     2.0     2.0        NaN  
4   2016-02-15 00:04:00     7.0     4.1        1.0
5   2016-02-15 00:05:00     2.0     5.0        2.0
6   2016-02-15 00:06:00     2.4     2.0        7.5
7   2016-02-15 00:07:00     2.0     6.3        1.2
8   2016-02-15 00:08:00     2.5     7.0        NaN

我想在每个时间戳中找到每列的非NaN记录的累计总和。也就是说,预期的输出数据帧应为:

              timestamp   col_A     col_B    col_C
0   2016-02-15 00:00:00     1       NaN        NaN  
1   2016-02-15 00:01:00     2       NaN        NaN
2   2016-02-15 00:02:00     3       1          NaN  
3   2016-02-15 00:03:00     4       2          NaN  
4   2016-02-15 00:04:00     5       3          1
5   2016-02-15 00:05:00     6       4          2
6   2016-02-15 00:06:00     7       5          3
7   2016-02-15 00:07:00     8       6          4
8   2016-02-15 00:08:00     9       7          NaN

我循环数据框并按记录查找cumsum记录。但是,我想知道这样做有更优雅吗?谢谢!

2 个答案:

答案 0 :(得分:4)

使用notnull + cumsum,请注意,np.nan是float类型,因此要使所有int数浮动。

df.iloc[:,1:]=df.iloc[:,1:].notnull().cumsum()[df.iloc[:,1:].notnull()]
df
Out[33]: 
            timestamp  col_A  col_B  col_C
0  2016-02-1500:00:00      1    NaN    NaN
1  2016-02-1500:01:00      2    NaN    NaN
2  2016-02-1500:02:00      3    1.0    NaN
3  2016-02-1500:03:00      4    2.0    NaN
4  2016-02-1500:04:00      5    3.0    1.0
5  2016-02-1500:05:00      6    4.0    2.0
6  2016-02-1500:06:00      7    5.0    3.0
7  2016-02-1500:07:00      8    6.0    4.0
8  2016-02-1500:08:00      9    7.0    NaN

答案 1 :(得分:1)

RewriteEngine On RewriteBase /captura/ RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.*)$ index.php/$1 [L]

内联
where

Inplace with df.assign(**(lambda d: d.cumsum().where(d))(df.drop('timestamp', 1).notna())) timestamp col_A col_B col_C 0 2016-02-15 00:00:00 1 NaN NaN 1 2016-02-15 00:01:00 2 NaN NaN 2 2016-02-15 00:02:00 3 1.0 NaN 3 2016-02-15 00:03:00 4 2.0 NaN 4 2016-02-15 00:04:00 5 3.0 1.0 5 2016-02-15 00:05:00 6 4.0 2.0 6 2016-02-15 00:06:00 7 5.0 3.0 7 2016-02-15 00:07:00 8 6.0 4.0 8 2016-02-15 00:08:00 9 7.0 NaN

update

详细

df.update((lambda d: d.cumsum().where(d))(df.drop('timestamp', 1).notna()))
df

             timestamp  col_A  col_B  col_C
0  2016-02-15 00:00:00      1    NaN    NaN
1  2016-02-15 00:01:00      2    NaN    NaN
2  2016-02-15 00:02:00      3    1.0    NaN
3  2016-02-15 00:03:00      4    2.0    NaN
4  2016-02-15 00:04:00      5    3.0    1.0
5  2016-02-15 00:05:00      6    4.0    2.0
6  2016-02-15 00:06:00      7    5.0    3.0
7  2016-02-15 00:07:00      8    6.0    4.0
8  2016-02-15 00:08:00      9    7.0    NaN