我有pandas格式的数据框,df =
index,result1,result2,result3
0 s u s
1 u s u
2 s
3 s s u
我想添加另一列,其中包含该行中出现次数的列表,例如
index,result1,result2,result3,count
0 s u s 2
1 u s u 1
2 s 1
3 s s u 2
我试过以下代码
col=['result1','result2','result3']
df[cols].count(axis=1)
但是会返回
0,3
1,3
2,1
3,3
所以这会计算元素的数量,然后尝试
df[df[cols]=='s'].count(axis=1)
但是这返回了以下错误:"无法将[' s']与块值进行比较"
非常感谢任何帮助
答案 0 :(得分:2)
对我来说,按astype
数字和string
列投放到NaN
的作品将返回error
:
print (df)
index result1 result2 result3 result4
0 0 s u 7 NaN
1 1 u s 7 NaN
2 2 s NaN 8 NaN
3 3 s s 7 NaN
4 4 NaN NaN 2 NaN
print (df.dtypes)
index int64
result1 object
result2 object
result3 int64
result4 float64
dtype: object
cols = ['result1','result2','result3','result4']
df['count'] = df[df[cols].astype(str) == 's'].count(axis=1)
print (df)
index result1 result2 result3 result4 count
0 0 s u 7 NaN 1
1 1 u s 7 NaN 1
2 2 s NaN 8 NaN 1
3 3 s s 7 NaN 2
4 4 NaN NaN 2 NaN 0
来自True
的{{3}} boolean mask
值:
print (df[cols].astype(str) == 's')
result1 result2 result3 result4
0 True False False False
1 False True False False
2 True False False False
3 True True False False
4 False False False False
cols = ['result1','result2','result3','result4']
df['count'] = (df[cols].astype(str) =='s').sum(axis=1)
print (df)
index result1 result2 result3 result4 count
0 0 s u 7 NaN 1
1 1 u s 7 NaN 1
2 2 s NaN 8 NaN 1
3 3 s s 7 NaN 2
4 4 NaN NaN 2 NaN 0
另一个不错的解决方案来自sum
- 使用numpy
:
df['count'] = (df[cols].values=='s').sum(axis=1)