我的正则表达式是
[Nn]issan(?=[^<>]*<)(?!(?:(?!</?(?:a|span)[ >/])(?:.|\n))*</(?:a|span)>)
我想在屏幕截图后停止在nissan word.view中捕获网址。
我使用 re.sub(pattern, new_word, paragraph, flags=re.U|re.M)
函数将此日产单词替换为new_word。
答案 0 :(得分:1)
你可以尝试这种模式:
[Nn]issan(?=[^<>]*<)(?!(?:(?!</?(?:a|span)[ >/])(?:.|\n))*</(?:a|span)>)
它有一个我知道的缺陷,即嵌套的<a>
或<span>
标签会将其绊倒,导致它匹配这样的东西:
<a>nissan<span></span><a>
说明:
[Nn]issan
(?= # make sure it's not inside an <a> or <span> tag, like <a href="nissan">
# to do that, we'll assert that the next "<" occurs before ">".
[^<>]*
<
)
(?! # next, make sure it's not enclosed in an <a> or <span> tag like <a>nissan</a>
# to do that, we'll match anything up to the next "a" or "span" tag, either opening or closing, and then assert the tag is opening.
(?: # while...
(?! #...there is no opening or closing "a" or "span" tag
<
/?
(?:
a|span
)
[ >]
)
(?: # consume the next character.
.|\n
)
)*
# then assert the tag is not closing.
</
(?:
a|span
)
>
)
答案 1 :(得分:-1)