我编写此代码以搜索文本中的确切字词(%PDF-1.1)
import re
x = "%PDF-1.1 pdf file contains four parts one of them the header part which looks like "
s = re.compile("%PDF-\d\.\d[\b\s]")
match = re.search("%PDF-\d\.\d[\b\s]",x)
if match:
print match.group()
else:
print "its not found"
但问题是,如果我有“%PDF-1.1”,它会返回结果%PDF-1.1,但这是错误的 当x =“pdf文件包含四个部分时,其中一个标题部分看起来像%PDF-1.1”它什么都没给我
我怎么能搜索确切的单词????
答案 0 :(得分:1)
目前,您正在搜索单词“%PDF-X-X”(其中X是数字),然后是更多内容,而不关心它之前的内容。如果你只想在字符串的开头,结尾搜索这个单词,或者如果它是一个单词(我假设它前后有空格),你可以试试这个:
import re
x = "%PDF-1.1 pdf file contains four parts one of them the header part which looks like "
y = "pdf file contains four parts one of them the header part which looks like %PDF-1.1"
s = re.compile("(^|\s)(?P<myword>%PDF-\d\.\d)($|\s)")
match = s.search(x)
if match:
print match.group("myword")
else:
print "its not found"
match = s.search(y)
if match:
print match.group("myword")
else:
print "its not found"
# %PDF-1.1
# %PDF-1.1
如果你想要的话,如果后面跟着一个符号,你也可以找到这样的词,你可以做出类似的东西,这样就可以得到任何不是字母或数字的东西:
s = re.compile("(^|\s)(?P<myword>%PDF-\d\.\d)($|\s|[^a-zA-Z0-9])")