Question

我试图找到python中字符串中子字符串的出现次数。但我需要我的搜索非常具体。在搜索子字符串之前，我删除了所有标点符号：

myString.translate（None，string.punctuation）

现在我搜索子字符串。如果我正在搜索子字符串“hello bob”并在我正在搜索的字符串中我有“hello bob-something else”或“hello bob'”以及其他一些文本。当我删除标点符号时，两个字符' - 不会被删除，因为它们是非unicode字符，因此上面提到的两个字符串不应被视为单词“hello bob”的出现。

我使用下面的正则表达式代码尝试获取正确的出现次数，在大文件中（3000行或更多）我开始没有得到正确的单词出现次数

counter = 0
searcher = re.compile("hello bob" + r'([^\w-]|$)').search
with open(myFile, 'r') as source:
    for line in source:
        if searcher(line):
            counter += 1

我试过的其他东西

我正在尝试使用findAll函数，因为到目前为止，它为我输入的单词提供了正确的出现次数。

我在stackoverflow上找到了这个：

re.findall(r'\bword\b', read)

无论如何我可以使用变量而不是单词吗？

例如我想使用：

myPhrase = "hello bob"
re.findall(r'\bmyPhrase\b', read)

哪个应该与：

相同

re.findall(r'\bhello bob\b', read)

Answer 1

您可以使用以下技巧执行字符串插值来解决问题。

myphrase = "hello bob"
pattern = r'\b{var}\b'.format(var = myphrase)

Answer 2

您可以使用re.escape(myPhrase)进行变量替换。

read = "hello bob ! how are you?"
myPhrase = "hello bob"
my_regex = r"\b" + re.escape(myPhrase) + r"\b"

counter = 0
if re.search(my_regex, read, re.IGNORECASE):
    counter += 1
else:
    print "not found"

python在字符串

2 个答案: