我有以下测试代码: 将熊猫作为pd导入 将numpy导入为np
df = pd.DataFrame({'MONTH': [1,2,3,1,1,1,1,1,1,2,3,2,2,3,2,1,1,1,1,1,1,1],
'HOUR': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'CIGFT': [np.NaN,12000,2500,73300,73300,np.NaN,np.NaN,np.NaN,np.NaN,12000,100,100,15000,2500,np.NaN,15000,11000,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]})
cigs = pd.DataFrame()
cigs['cigsum'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: (c>=0.0).sum())
cigs['cigcount'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: (c>=0.0).count())
df.fillna(value='-', inplace=True)
cigs['cigminus'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: (c>=0.0).sum())
tfile = open('test_COUNT_manual.txt', 'a')
tfile.write(cigs.to_string())
tfile.close()
我得到以下结果:
数据框:
CIGFT HOUR MONTH
0 NaN 0 1
1 12000.0 0 2
2 2500.0 0 3
3 73300.0 0 1
4 73300.0 0 1
5 NaN 0 1
6 NaN 0 1
7 NaN 0 1
8 NaN 0 1
9 12000.0 0 2
10 100.0 0 3
11 100.0 0 2
12 15000.0 0 2
13 2500.0 0 3
14 NaN 0 2
15 15000.0 0 1
16 11000.0 0 1
17 NaN 0 1
18 NaN 0 1
19 NaN 0 1
20 NaN 0 1
21 NaN 0 1
写入文件的结果:
cigsum cigcount cigminus
MONTH HOUR
1 0 4 14 14
2 0 4 5 5
3 0 3 3 3
我的问题是.sum()未对值求和。它正在对非null值进行计数。当我用减号替换空值时,.sum() 产生与count()相同的结果。 那么,如果.sum()不这样做,该怎么用来获取值的总和?
答案 0 :(得分:0)
Series.sum()->返回序列值的总和,默认情况下不包括 NA /空值,如官方文档中所述。
您每次都会在lambda函数中获得序列,只需将sum函数应用于lambda中的序列将为您提供正确的结果。
执行此操作
cigs['cigsum'] = df.groupby(['MONTH','HOUR'])['CIGFT'].apply(lambda c: c.sum())
此代码的结果将是
MONTH HOUR
1 0 172600.0
2 0 39100.0
3 0 5100.0