我有以下数据框
import pandas as pd
import numpy as np
d = {
'ID':[1,2,3],
'W1':[5,6,7],
'W2':[9, np.nan,10],
'w3':[11,np.nan,np.nan]
}
df = pd.DataFrame(data = d)
df
ID W1 W2 w3
0 1 5 9.0 11.0
1 2 6 NaN NaN
2 3 7 10.0 NaN
我正在执行以下操作
df['Sum1'] = (df[['W1','W2']]).sum(axis = 1)/2
df['Sum2'] = (df[['W2','w3']]).sum(axis = 1)/2
ID W1 W2 w3 Sum1 Sum2
0 1 5 9.0 11.0 7.0 10.0
1 2 6 NaN NaN 3.0 0.0
2 3 7 10.0 NaN 8.5 5.0
完成上述操作后如何将ID为“ 2”的Sum2设为 NaN 而不是 0 ?
答案 0 :(得分:2)
将参数min_count=1
添加到DataFrame.sum
:
最低计数:整数,默认为 0
执行操作所需的有效值数量。如果存在少于min_count个非NA值,则结果将为NA。0.22.0版中的新增功能:添加了默认值0。这表示全NA或空系列的总和为0,全NA或空系列的乘积为1。
df['Sum1'] = (df[['W1','W2']]).sum(axis = 1, min_count=1)/2
df['Sum2'] = (df[['W2','w3']]).sum(axis = 1, min_count=1)/2
print (df)
ID W1 W2 w3 Sum1 Sum2
0 1 5 9.0 11.0 7.0 10.0
1 2 6 NaN NaN 3.0 NaN
2 3 7 10.0 NaN 8.5 5.0
但是似乎您需要mean
s-然后它的工作原理就像需要:
df['Sum1'] = (df[['W1','W2']]).mean(axis = 1)
df['Sum2'] = (df[['W2','w3']]).mean(axis = 1)
print (df)
ID W1 W2 w3 Sum1 Sum2
0 1 5 9.0 11.0 7.0 10.0
1 2 6 NaN NaN 6.0 NaN
2 3 7 10.0 NaN 8.5 10.0