匹配字符串

时间:2016-11-11 10:54:17

标签: python pandas count conditional

我有pandas格式的数据框,df =

index,result1,result2,result3 
  0     s       u       s     
  1     u       s       u   
  2     s                     
  3     s       s       u 

我想添加另一列,其中包含该行中出现次数的列表,例如

index,result1,result2,result3,count 
  0     s       u       s      2
  1     u       s       u      1
  2     s                      1
  3     s       s       u      2

我试过以下代码

col=['result1','result2','result3']
df[cols].count(axis=1)

但是会返回

0,3
1,3
2,1
3,3

所以这会计算元素的数量,然后尝试

df[df[cols]=='s'].count(axis=1)

但是这返回了以下错误:"无法将[' s']与块值进行比较"

非常感谢任何帮助

1 个答案:

答案 0 :(得分:2)

对我来说,按astype数字和string列投放到NaN的作品将返回error

print (df)
   index result1 result2  result3  result4
0      0       s       u        7      NaN
1      1       u       s        7      NaN
2      2       s     NaN        8      NaN
3      3       s       s        7      NaN
4      4     NaN     NaN        2      NaN

print (df.dtypes)
index        int64
result1     object
result2     object
result3      int64
result4    float64
dtype: object

cols = ['result1','result2','result3','result4']
df['count'] = df[df[cols].astype(str) == 's'].count(axis=1)
print (df)
   index result1 result2  result3  result4  count
0      0       s       u        7      NaN      1
1      1       u       s        7      NaN      1
2      2       s     NaN        8      NaN      1
3      3       s       s        7      NaN      2
4      4     NaN     NaN        2      NaN      0

来自True的{​​{3}} boolean mask值:

print (df[cols].astype(str) == 's')

  result1 result2 result3 result4
0    True   False   False   False
1   False    True   False   False
2    True   False   False   False
3    True    True   False   False
4   False   False   False   False

cols = ['result1','result2','result3','result4']
df['count'] = (df[cols].astype(str) =='s').sum(axis=1)
print (df)
   index result1 result2  result3  result4  count
0      0       s       u        7      NaN      1
1      1       u       s        7      NaN      1
2      2       s     NaN        8      NaN      1
3      3       s       s        7      NaN      2
4      4     NaN     NaN        2      NaN      0

另一个不错的解决方案来自sum - 使用numpy

df['count'] = (df[cols].values=='s').sum(axis=1)