如何在python中搜索确切的单词?

时间:2014-04-13 21:40:31

标签: python search

我编写此代码以搜索文本中的确切字词(%PDF-1.1)

import re
x = "%PDF-1.1 pdf file contains four parts one of them the header part which looks like "
s = re.compile("%PDF-\d\.\d[\b\s]") 
match = re.search("%PDF-\d\.\d[\b\s]",x)
if match:
    print match.group()
else:
    print "its not found"

但问题是,如果我有“%PDF-1.1”,它会返回结果%PDF-1.1,但这是错误的 当x =“pdf文件包含四个部分时,其中一个标题部分看起来像%PDF-1.1”它什么都没给我

我怎么能搜索确切的单词????

1 个答案:

答案 0 :(得分:1)

目前,您正在搜索单词“%PDF-X-X”(其中X是数字),然后是更多内容,而不关心它之前的内容。如果你只想在字符串的开头,结尾搜索这个单词,或者如果它是一个单词(我假设它前后有空格),你可以试试这个:

import re
x = "%PDF-1.1 pdf file contains four parts one of them the header part which looks like "
y = "pdf file contains four parts one of them the header part which looks like %PDF-1.1"
s = re.compile("(^|\s)(?P<myword>%PDF-\d\.\d)($|\s)") 
match = s.search(x)
if match:
    print match.group("myword")
else:
    print "its not found"

match = s.search(y)
if match:
    print match.group("myword")
else:
    print "its not found"

# %PDF-1.1
# %PDF-1.1

如果你想要的话,如果后面跟着一个符号,你也可以找到这样的词,你可以做出类似的东西,这样就可以得到任何不是字母或数字的东西:

s = re.compile("(^|\s)(?P<myword>%PDF-\d\.\d)($|\s|[^a-zA-Z0-9])")