熊猫series.groupby()。apply(.sum())、. sum()不对值求和

时间:2018-12-19 14:22:33

标签: pandas-groupby

我有以下测试代码:     将熊猫作为pd导入     将numpy导入为np

df = pd.DataFrame({'MONTH': [1,2,3,1,1,1,1,1,1,2,3,2,2,3,2,1,1,1,1,1,1,1], 
                   'HOUR': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 
                   'CIGFT': [np.NaN,12000,2500,73300,73300,np.NaN,np.NaN,np.NaN,np.NaN,12000,100,100,15000,2500,np.NaN,15000,11000,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]})

cigs = pd.DataFrame()
cigs['cigsum'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: (c>=0.0).sum())
cigs['cigcount'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: (c>=0.0).count())

df.fillna(value='-', inplace=True)
cigs['cigminus'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: (c>=0.0).sum())

tfile = open('test_COUNT_manual.txt', 'a')
tfile.write(cigs.to_string())
tfile.close()

我得到以下结果:

数据框:

  CIGFT  HOUR  MONTH

0 NaN 0 1

1 12000.0 0 2

2 2500.0 0 3

3 73300.0 0 1

4 73300.0 0 1

5 NaN 0 1

6 NaN 0 1

7 NaN 0 1

8 NaN 0 1

9 12000.0 0 2

10 100.0 0 3

11 100.0 0 2

12 15000.0 0 2

13 2500.0 0 3

14 NaN 0 2

15 15000.0 0 1

16 11000.0 0 1

17 NaN 0 1

18 NaN 0 1

19 NaN 0 1

20 NaN 0 1

21 NaN 0 1

写入文件的结果:

        cigsum  cigcount  cigminus

MONTH HOUR

1 0 4 14 14

2 0 4 5 5

3 0 3 3 3

我的问题是.sum()未对值求和。它正在对非null值进行计数。当我用减号替换空值时,.sum() 产生与count()相同的结果。 那么,如果.sum()不这样做,该怎么用来获取值的总和?

1 个答案:

答案 0 :(得分:0)

Series.sum()->返回序列值的总和,默认情况下不包括 NA /空值,如官方文档中所述。

您每次都会在lambda函数中获得序列,只需将sum函数应用于lambda中的序列将为您提供正确的结果。

执行此操作

cigs['cigsum'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: c.sum())

此代码的结果将是

MONTH  HOUR
1      0       172600.0
2      0        39100.0
3      0         5100.0