Question

我有以下代码。基本上，我试图替换一个单词，如果它匹配这些正则表达式模式之一。如果单词匹配一次，则该单词应该从新列表中完全消失。下面的代码有效，但是，我想知道是否有一种方法可以实现这一点，这样我就可以无限期地添加更多模式给“拍拍”。 list而不必在for循环中编写额外的if语句。

为了澄清，我的正则表达式模式有负面的前瞻和外观，以确保它是一个单词。

pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']

if isinstance(x, list):
    new = []
    for i in x:
        if re.search(pat[0], i):
            i = re.sub(pat[0], '', i)
        if re.search(pat[1], i):
            i = re.sub(pat[1], '', i)
        if len(i) > 0:
            new.append(i)
    x = new 
else:
    x = x.strip()

Answer 1

只需添加另一个for循环：

for patn in pat:
    if re.search(patn, i):
        i  = re.sub(patn, '', i)
if i:
    new.append(i)

Answer 2

pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']

if isinstance(x, list):
    new = []
    for i in x:
        for p in pat:
            i = re.sub(p, '', i)
        if len(i) > 0:
            new.append(i)
    x = new 
else:
    x = x.strip()

Answer 3

添加另一个循环：

pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']

if isinstance(x, list):
    new = []
    for i in x:
        # iterate through pat list
        for regx in pat:
            if re.search(regx, i):
                i = re.sub(regx, '', i)
    ...

Answer 4

如果在你的模式中，那么更改只是单词，那么你可以添加用|加入的单词来制作它。因此，对于您的示例中的两个模式将变为类似于下一个模式。

r'(?<![a-z][ ])(?:Pacific|Global)(?![ ])'

如果您需要添加更多字词，只需添加管道即可。例如(?:word1|word2|word3)

括号内?:表示不捕获该组。

Answer 5

类似的东西：

[word for word in l if not any(re.search(p, word) for p in pat)]

Answer 6

我会在这里尝试猜测;如果我错了，请跳到＆＃34;这就是我写它的方式＆＃34;并根据你打算做的事情修改我提供的代码（我可能无法理解）。

我假设你正试图消除“全球”这两个词。和＃34;太平洋＆＃34;在可能包含它们的短语列表中。如果是这种情况，我认为您的正则表达式不会执行您指定的操作。您可能打算使用以下内容（它不能正常工作！）：</ p>

pat = [r'(?<=[a-z][ ])Pacific(?=[ ])', r'(?<=[a-z][ ])Global(?=[ ])']

区别在于前瞻模式，即正面（(?=...)和(?<=...)）而不是负面（(?!...)和(?<!...)）。

此外，编写这样的正则表达式并不能始终正确地消除单词之间的空白区域。

我就是这样写的：

words = ['Pacific', 'Global']
pat = "|".join(r'\b' + word + r'\b\s*' for word in words)
if isinstance(x, str):
    x = x.strip()        # I don't understand why you don't sub here, anyway!
else:
    x = [s for s in (re.sub(pat, '', s) for s in x) if s != '']

在模式的正则表达式中，注意（a）\b，代表＆＃34;空字符串，但仅在单词的开头或结尾处＃34; （参见manual），（b）使用|分隔替代模式，（c）\s，代表＆＃34;字符被视为空格＆＃34;。后者是在每个被删除的单词之后正确删除不必要空间的方法。

这在Python 2和Python 3中都能正常工作。我认为代码更清晰，就效率而言，如果你让re完成工作而不是测试每个代码，那么它是最好的。模式分开。

假设：

x = ["from Global a to Pacific b",
     "Global Pacific",
     "Pacific Global",
     "none",
     "only Global and that's it"]

这会产生：

x = ['from a to b', 'none', "only and that's it"]

更有效的方法来根据条件替换列表中的项目

6 个答案: