Question

我正在尝试编写一个正则表达式，它可以匹配字符串中任意位置的XYZ之后的任何ABC：

实施例。文本 - “一些ABC文本后跟XYZ，后跟多个ABC，更多ABC，更多ABC”

即，正则表达式应匹配XYZ之后的三个ABC。

任何线索？

Answer 1

只需匹配重复XYZ上的文字ABC和群组：

r'XYZ((?:ABC)+)'

(?:ABC)+模式至少匹配一组文字字符，整个组前面都有一个文字XYZ。

这是非常基本的正则表达式101，你应该阅读一个好的tutorial on regular expression matching来开始。

Answer 2

这样的东西？ r"(?<=XYZ)((?:ABC)+)"。这只会在ABC跟随XYZ时匹配XYZ，但不会包含XYZ本身。

修改

看起来我误解了OP的原始问题。最简单的方法是首先找到字符串XYZ。保存p.finditer(string, startpos)的起始位置。使用起始位置作为r"(ABC)"的额外参数。请注意，这只适用于编译的正则表达式，因此您需要先编译模式。

您需要的模式只是p.sub()。

或者，您可以使用p.sub()，它也会进行替换，但为了只处理字符串的一部分，您需要先创建一个子字符串。 startpos没有{{1}}参数。

Answer 3

您可以采取迭代方法：

s = "Some ABC text followed by XYZ followed by multiple ABC, more ABC, more ABC"

pattern = re.compile(r'(?<=XYZ)(.*?)ABC')
while pattern.search(s):
   s = pattern.sub(r'\1REPLACED', s)

print s

输出：

一些ABC文本后跟XYZ，后跟多个REPLACED，更多已更换，更换

Answer 4

集合中有一个漂亮的Counter对象可能会有所帮助。 Counter对象是一个字典，其中键是单个项，值是计数。例如：

Counter('hello there hello'.split()) # {'hello':2, 'there', 1}

由于我们想要计算单词，我们必须在任何看到空格的地方拆分短语。这是split方法的默认行为。这是一个使用Counter的示例脚本。如果需要，下半部分可以适应功能。

from collections import Counter

def count_frequency(phrase):
    """ Return a dictionary with {word: num_of_occurences} """
    counts = Counter(phrase.split())
    return counts

def replace_word(target_word, replacement, phrase):
    """ Replaces *word* with *replacement* in string *phrase* """
    phrase = phrase.split()

    for count, word in enumerate(phrase):
        if word == target_word:
            phrase[count] = replacement

    return ''.join(phrase)

phrase = "hello there hello hello"
word_counts = count_frequency(phrase)
new_phrase = ''
replacement = 'replaced'

for word in word_counts:
    if word_counts[word] > 2:
        phrase = phrase.replace(word, replacement)

print(phrase)

正则表达式用于匹配字符串中任意位置的XYZ之后的任何ABC

4 个答案: