str.contains和str.find的结果不同

时间:2018-07-18 06:22:57

标签: pandas

我认为两者都应给出相同的答案:

train = pd.read_csv('https://raw.github.com/mattdelhey/kaggle-titanic/master/Data/train.csv')
train.name.str.contains('Mr.').sum()
(train.name.str.find('Mr.')>0).sum()

但输出是:

647
517

结果不同的原因是什么?

1 个答案:

答案 0 :(得分:1)

差异str.contains也匹配Mrs.,因为.是特殊的正则表达式字符(用于匹配任何字符)。

我认为需要对其进行转义或添加参数regex=False

print(train.name.str.contains('Mr\.').sum())
517
print(train.name.str.contains('Mr.', regex=False).sum())
517
print((train.name.str.find('Mr.')>0).sum())
517

测试差异:

a = train.loc[train.name.str.contains('Mr.'), 'name']
b = train.loc[(train.name.str.find('Mr.')>0), 'name']


c = pd.concat([a, b], axis=1, keys=('contains','find'))
c = c[c.isnull().any(axis=1)]
print (c)
                                              contains find
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  NaN
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  NaN
8    Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)  NaN
9                  Nasser, Mrs. Nicholas (Adele Achem)  NaN
15                    Hewlett, Mrs. (Mary D Kingcome)   NaN
18   Vander Planke, Mrs. Julius (Emelia Maria Vande...  NaN
19                             Masselmani, Mrs. Fatima  NaN
25   Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...  NaN
31      Spencer, Mrs. William Augustus (Marie Eugenie)  NaN
40      Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  NaN
41   Turpin, Mrs. William John Robert (Dorothy Ann ...  NaN
49       Arnold-Franchi, Mrs. Josef (Josefine Franchi)  NaN
52            Harper, Mrs. Henry Sleeper (Myna Haxtun)  NaN
53   Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkin...  NaN
66                        Nye, Mrs. (Elizabeth Ramell)  NaN
85   Backstrom, Mrs. Karl Alfred (Maria Mathilda Gu...  NaN
...
...