Question

我有一个具有两个类似短语的数据框，例如“ Hello World”和“ Hello World 1”。我只想匹配“ Hello World”字符串。

我当前正在使用： dataframe ['Phrase']。str.match（'Hello World'）但这显然会返回短语“ Hello World”和“ Hello World 1”。有没有一种方法可以仅对短语进行完全匹配？

Answer 1

您可以使用RegEx获得这样的结果：

import re

phrase_to_find = 'Hello World'
phrases = ['Hello World', 'Hello World 1']

for phrase in phrases:
    if re.search(r'\b' + phrase + r'\b', phrase_to_find):
        print('Found the following match: {}'.format(phrase))

\ b表示单词边界。

Answer 2

您需要做的是相等性测试：

dataframe['Phrase'] == 'Hello World'

这将返回类似于您的子字符串匹配大小写的布尔数组，但需要完全匹配。

示例：

a.csv

Phrase,Other_field
Hello World,1
Hello World 1,2
Something else,3

数据框：

>>> import pandas as pd
>>> dataframe = pd.read_csv('a.csv')

>>> dataframe
           Phrase  Other_field
0     Hello World            1
1   Hello World 1            2
2  Something else            3

您的子字符串匹配项：

>>> dataframe['Phrase'].str.match('Hello World')
0     True
1     True
2    False
Name: Phrase, dtype: bool

完全匹配：

>>> dataframe['Phrase'] == 'Hello World'
0     True
1    False
2    False
Name: Phrase, dtype: bool

Answer 3

字符串上的正则表达式。

import re

...
...

if re.search(r'^Hello World$', data_frame_string):
    # Then the string matches, do whatever with the string.
    ....

如何在python中完全匹配字符串？

3 个答案: