Python正则表达式选择&#34;日产&#34;除<a>...</a>或<span> ... </span>标记之外的字词

时间:2014-09-29 06:35:12

标签: python html regex

在实时观看 regex101

我的正则表达式是

[Nn]issan(?=[^<>]*<)(?!(?:(?!</?(?:a|span)[ >/])(?:.|\n))*</(?:a|span)>)

我想在屏幕截图后停止在nissan word.view中捕获网址。

我使用 re.sub(pattern, new_word, paragraph, flags=re.U|re.M) 函数将此日产单词替换为new_word。

enter image description here

2 个答案:

答案 0 :(得分:1)

你可以尝试这种模式:

[Nn]issan(?=[^<>]*<)(?!(?:(?!</?(?:a|span)[ >/])(?:.|\n))*</(?:a|span)>)

它有一个我知道的缺陷,即嵌套的<a><span>标签会将其绊倒,导致它匹配这样的东西:

<a>nissan<span></span><a>

See demo.

说明:

[Nn]issan
(?= # make sure it's not inside an <a> or <span> tag, like <a href="nissan">
    # to do that, we'll assert that the next "<" occurs before ">".
    [^<>]*
    <
)
(?! # next, make sure it's not enclosed in an <a> or <span> tag like <a>nissan</a>
    # to do that, we'll match anything up to the next "a" or "span" tag, either opening or closing, and then assert the tag is opening.
    (?: # while...
        (?! #...there is no opening or closing "a" or "span" tag
            <
            /?
            (?:
                a|span
            )
            [ >]
        )
        (?: # consume the next character.
            .|\n
        )
    )*
    # then assert the tag is not closing.
    </
    (?:
        a|span
    )
    >
)

答案 1 :(得分:-1)

Nissan(?!((?!<\/a>).)*<\/a>|((?!<\/span>).)*<\/span>)

试试这个。看看演示。

http://regex101.com/r/dN8sA5/2