Question

仅使用python正则表达式，如何查找和替换句子中第n个单词的出现？例如：

str = 'cat goose  mouse horse pig cat cow'
new_str = re.sub(r'cat', r'Bull', str)
new_str = re.sub(r'cat', r'Bull', str, 1)
new_str = re.sub(r'cat', r'Bull', str, 2)

上面有一句话，“cat”这个词出现在句子中两次。我希望第二次出现的'猫'改为'公牛'，留下第一个'猫'字。我的最后一句话看起来像： “猫鹅鼠马猪公牛”。在我上面的代码中，我试过3次不能得到我想要的东西。

Answer 1

使用如下所示的否定前瞻。

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^((?:(?!cat).)*cat(?:(?!cat).)*)cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

DEMO

^断言我们刚开始。
(?:(?!cat).)*匹配任何字符，但不匹配cat，零次或多次。
cat匹配第一个cat子字符串。
(?:(?!cat).)*匹配任何字符，但不匹配cat，零次或多次。
现在，将所有模式包含在像((?:(?!cat).)*cat(?:(?!cat).)*)这样的捕获组中，以便我们稍后可以引用这些捕获的字符。
cat现在匹配以下第二个cat字符串。

或

>>> s = "cat goose mouse horse pig cat cow" >>> re.sub(r'^(.*?(cat.*?){1})cat', r'\1Bull', s) 'cat goose mouse horse pig Bull cow'

更改{}内的数字，以替换字符串cat的第一个或第二个或第n个出现

要替换字符串cat的第三个匹配项，请将2放在花括号中。

>>> re.sub(r'^(.*?(cat.*?){2})cat', r'\1Bull', "cat goose mouse horse pig cat foo cat cow") 'cat goose mouse horse pig cat foo Bull cow'

Play with the above regex on here ...

Answer 2

这是一种没有正则表达式的方法：

def replaceNth(s, source, target, n):
    inds = [i for i in range(len(s) - len(source)+1) if s[i:i+len(source)]==source]
    if len(inds) < n:
        return  # or maybe raise an error
    s = list(s)  # can't assign to string slices. So, let's listify
    s[inds[n-1]:inds[n-1]+len(source)] = target  # do n-1 because we start from the first occurrence of the string, not the 0-th
    return ''.join(s)

用法：

In [278]: s
Out[278]: 'cat goose  mouse horse pig cat cow'

In [279]: replaceNth(s, 'cat', 'Bull', 2)
Out[279]: 'cat goose  mouse horse pig Bull cow'

In [280]: print(replaceNth(s, 'cat', 'Bull', 3))
None

Answer 3

我使用简单函数，列出所有出现次数，选择第n个位置并使用它将原始字符串拆分为两个子字符串。然后它替换第二个子字符串中的第一个匹配项，并将子字符串连接回新字符串：

import re

def replacenth(string, sub, wanted, n)
    where = [m.start() for m in re.finditer(sub, string)][n-1]
    before = string[:where]
    after = string[where:]
    after.replace(sub, wanted, 1)
    newString = before + after
    print newString

对于这些变量：

string = 'ababababababababab'
sub = 'ab'
wanted = 'CD'
n = 5

输出：

ababababCDabababab

注意：

where变量实际上是匹配列表＆＃39;你拿起第n个位置的位置。但是列表项索引通常以0开头，而不是1。因此，有一个n-1索引，n变量是实际的第n个子字符串。我的例子找到第5个字符串如果您使用n索引并希望找到第5个位置，则您需要n为4。你使用的通常取决于函数，它生成我们的n。

这应该是最简单的方法，但它不仅仅是你原来想要的正则表达式。

来源和一些链接：


where施工：Find all occurrences of a substring in Python

字符串拆分：https://www.daniweb.com/programming/software-development/threads/452362/replace-nth-occurrence-of-any-sub-string-in-a-string

类似的问题：Find the nth occurrence of substring in a string

Answer 4

我会定义一个适用于每个正则表达式的函数：

import re

def replace_ith_instance(string, pattern, new_str, i = None, pattern_flags = 0):
    # If i is None - replacing last occurrence
    match_obj = re.finditer(r'{0}'.format(pattern), string, flags = pattern_flags)
    matches = [item for item in match_obj]
    if i == None:
        i = len(matches)
    if len(matches) == 0 or len(matches) < i:
        return string
    match = matches[i - 1]
    match_start_index = match.start()
    match_len = len(match.group())

    return '{0}{1}{2}'.format(string[0:match_start_index], new_str, string[match_start_index + match_len:])

一个工作示例：

str = 'cat goose  mouse horse pig cat cow'
ns = replace_ith_instance(str, 'cat', 'Bull', 2)
print(ns)

输出：

cat goose  mouse horse pig Bull cow

另一个例子：

str2 = 'abc abc def abc abc'
ns = replace_ith_instance(str2, 'abc\s*abc', '666')
print(ns)

输出：

abc abc def 666

Answer 5

您可以匹配两次出现的“cat”，在第二次出现之前保留所有内容（\1）并添加“Bull”：

new_str = re.sub(r'(cat.*?)cat', r'\1Bull', str, 1)

我们只做一次替换，以避免替换“cat”的第四次，第六次等等（至少发生四次），正如Avinash Raj评论所指出的那样。

如果要替换n次出现而非第二次出现，请使用：

n = 2
new_str = re.sub('(cat.*?){%d}' % (n - 1) + 'cat', r'\1Bull', str, 1)

顺便说一下，你不应该使用str作为变量名，因为它是一个Python保留的关键字。

Answer 6

创建一个repl函数以传递到re.sub()。除了......诀窍是让它成为一个类，这样你就可以跟踪通话数。

class ReplWrapper(object):
    def __init__(self, replacement, occurrence):
        self.count = 0
        self.replacement = replacement
        self.occurrence = occurrence
    def repl(self, match):
        self.count += 1
        if self.occurrence == 0 or self.occurrence == self.count:
            return match.expand(self.replacement)
        else:
            try:
                return match.group(0)
            except IndexError:
                return match.group(0)

然后像这样使用它：

myrepl = ReplWrapper(r'Bull', 0) # replaces all instances in a string
new_str = re.sub(r'cat', myrepl.repl, str)

myrepl = ReplWrapper(r'Bull', 1) # replaces 1st instance in a string
new_str = re.sub(r'cat', myrepl.repl, str)

myrepl = ReplWrapper(r'Bull', 2) # replaces 2nd instance in a string
new_str = re.sub(r'cat', myrepl.repl, str)

我确信有一种更聪明的方法可以避免使用课程，但这似乎很容易解释。此外，请务必返回match.expand()，因为只需返回替换值，从技术上讲，某人决定使用\1类型模板在技术上是不正确的。

Answer 7

如何将nth needle替换为word：

s.replace(needle,'$$$',n-1).replace(needle,word,1).replace('$$$',needle)

Answer 8

我通过生成相对于整个字符串的所需捕获模式的“分组”版本来解决这个问题，然后将 sub 直接应用于该实例。

父函数是 regex_n_sub，收集与 re.sub() 方法相同的输入。

catch 模式 与实例编号一起传递给 get_nsubcatch_catch_pattern()。在内部，列表推导式生成模式 '.*? 的倍数（匹配任何字符，0 次或多次重复，非贪婪）。此模式将用于表示 catch_pattern 前 n 次出现之间的空间。

接下来，将输入的 catch_pattern 放置在“空格模式”的每第 n 个之间并用括号括起来以形成第一组。

第二组只是括在括号中的 catch_pattern - 所以当这两个组组合时，一个模式用于，'直到第 n 次出现捕获模式的所有文本 已创建。这个“new_catch_pattern”内置了两个组，因此可以替换包含第n次出现的catch_pattern的第二个组。

replace 模式 被传递给 get_nsubcatch_replace_pattern() 并与前缀 r'\g<1>' 组合形成模式 \g<1> + replace_pattern。此模式的 \g<1> 部分从捕获模式中定位组 1，并将该组替换为替换模式中的后续文本。

下面的代码是冗长的，只是为了更清楚地理解流程；可以根据需要减少。

--

下面的示例应该独立运行，并将“I”的第 4 个实例更正为“me”：

<块引用>

“当我独自一人去公园时，我认为鸭子会嘲笑我，但我不确定。”

与

<块引用>

“当我独自一人去公园时，我认为鸭子会嘲笑我，但我不确定。”

import regex as re

def regex_n_sub(catch_pattern, replace_pattern, input_string, n, flags=0):
    new_catch_pattern, new_replace_pattern = generate_n_sub_patterns(catch_pattern, replace_pattern, n)
    return_string = re.sub(new_catch_pattern, new_replace_pattern, input_string, 1, flags)
    return return_string

def generate_n_sub_patterns(catch_pattern, replace_pattern, n):
    new_catch_pattern = get_nsubcatch_catch_pattern(catch_pattern, n)
    new_replace_pattern = get_nsubcatch_replace_pattern(replace_pattern, n)
    return new_catch_pattern, new_replace_pattern

def get_nsubcatch_catch_pattern(catch_pattern, n):
    space_string = '.*?'
    space_list = [space_string for i in range(n)]
    first_group = catch_pattern.join(space_list)
    first_group = first_group.join('()')
    second_group = catch_pattern.join('()')
    new_catch_pattern = first_group + second_group
    return new_catch_pattern

def get_nsubcatch_replace_pattern(replace_pattern, n):
    new_replace_pattern = r'\g<1>' + replace_pattern
    return new_replace_pattern


### use test ###
catch_pattern = 'I'
replace_pattern = 'me'
test_string = "When I go to the park and I am alone I think the ducks laugh at I but I'm not sure."

regex_n_sub(catch_pattern, replace_pattern, test_string, 4)

此代码可以直接复制到工作流中，并将被替换的对象返回给 regex_n_sub() 函数调用。

如果实施失败，请告诉我！

谢谢！

如何使用python正则表达式查找和替换句子中第n个单词的出现？

8 个答案: