Question

在实时观看 `regex101`

我的正则表达式是

[Nn]issan(?=[^<>]*<)(?!(?:(?!</?(?:a|span)[ >/])(?:.|\n))*</(?:a|span)>)

我想在屏幕截图后停止在nissan word.view中捕获网址。

我使用 re.sub(pattern, new_word, paragraph, flags=re.U|re.M) 函数将此日产单词替换为new_word。

enter image description here

Answer 1

你可以尝试这种模式：

[Nn]issan(?=[^<>]*<)(?!(?:(?!</?(?:a|span)[ >/])(?:.|\n))*</(?:a|span)>)

它有一个我知道的缺陷，即嵌套的<a>或<span>标签会将其绊倒，导致它匹配这样的东西：

<a>nissan<span></span><a>

See demo.

说明：

[Nn]issan
(?= # make sure it's not inside an <a> or <span> tag, like <a href="nissan">
    # to do that, we'll assert that the next "<" occurs before ">".
    [^<>]*
    <
)
(?! # next, make sure it's not enclosed in an <a> or <span> tag like <a>nissan</a>
    # to do that, we'll match anything up to the next "a" or "span" tag, either opening or closing, and then assert the tag is opening.
    (?: # while...
        (?! #...there is no opening or closing "a" or "span" tag
            <
            /?
            (?:
                a|span
            )
            [ >]
        )
        (?: # consume the next character.
            .|\n
        )
    )*
    # then assert the tag is not closing.
    </
    (?:
        a|span
    )
    >
)

Answer 2

Nissan(?!((?!<\/a>).)*<\/a>|((?!<\/span>).)*<\/span>)

试试这个。看看演示。

http://regex101.com/r/dN8sA5/2

Python正则表达式选择＆＃34;日产＆＃34;除<a>...</a>或<span> ... </span>标记之外的字词

在实时观看 `regex101`

2 个答案:

Python正则表达式选择＆＃34;日产＆＃34;除<a>...</a>或<span> ... </span>标记之外的字词

在实时观看 regex101

2 个答案:

在实时观看 `regex101`