字符串包含带有单词+后缀的HTML标记(在本例中为... rem)
示例:
<b>SomeText...rem</b>
<u>SomeText...rem</u>
<strong>SomeText...rem</strong>
<a href="/">SomeText...rem</a>
<div>SomeText...rem</div>
当HTML标记内的单词包含
时...rem
应删除完整的HTML标记+字词。
我可以重命名&#34; ... rem&#34;。它只是一个标记。
这可能吗?
答案 0 :(得分:1)
我强烈建议您使用HTML parser。但是,由于您的问题要求使用正则表达式,因此您可以使用以下内容并替换回调中的匹配项。
/(?s)<(\w+)[^>]*>(.*?)<\/\1>/
<强>解释强>
(?s)
- s
标记,以便.
字符也匹配换行符。<(\w+)[^>]*>
- 匹配一个开头HTML标记并捕获元素名称(.*?)
- 第二个捕获组以匹配HTML标记的内容<\/\1>
- 根据第一个捕获组(标记名称)使用反向引用来匹配结束HTML标记。如果第二个捕获组包含子串...rem
,则使用function preg_replace_callback
以将匹配替换为空的sting。否则,通过将匹配替换为自身来做任何事情。
preg_replace_callback('/(?s)<(\w+)[^>]*>(.*?)<\/\1>/', function ($m) {
return strpos($m[2], '...rem') !== false ? '' : $m[0];
}, $string);
答案 1 :(得分:0)
以为我会开枪。
使用PHP,这是一个确切的方法。
更新版本
这使用\K
构造,因此无需回写
跟踪器数据到字符串。只需替换 nothing
通过这种方式也可以提高速度。
# ** Usage **
# -----------------
# Find: ''~(?s)(?:(?:(?&Comment)?(?!(?&RawContent)|(?&Comment)).)*\K(?(?=\z)|(?<OpenTag>(?><(?:(?<TagName>[\w:]+)(?:".*?"|\'.*?\'|[^>]*?)+)>)(?<!/>))(?<Body>(?&Char_Not_Tag)*?(?:(?&Tag_Not_TargetOpen)(?&Char_Not_Tag)*?)*?(?=.)(?&RawContent)(?&Char_Not_Tag)*?(?:(?&Tag_Not_TargetOpen)(?&Char_Not_Tag)*?)*?)(?<CloseTag>(?><(?:/\2\s*)>)))|.*?(?:(?&RawContent)|(?&Comment))\K)(?(DEFINE)(?<RawContent>\.\.\.rem)(?<Tag_Not_TargetOpen>(?><(?:(?!\2)[\w:]+(?:".*?"|\'.*?\'|[^>]*?)+)>|(?&Comment)))(?<Char_Not_Tag>(?!(?><(?:[\w:]+(?:".*?"|\'.*?\'|[^>]*?)+)>)|(?&Comment)).)(?<Comment>(?><(?:!(?:(?:DOCTYPE.*?)|(?:\[CDATA\[.*?\]\])|(?:--.*?--)|(?:ATTLIST.*?)|(?:ENTITY.*?)|(?:ELEMENT.*?)))>)))~'
# Replace: nothing
# Dot-all modifier
(?s)
# Single group, two alternatives.
(?:
# Alternative 1 (highest priority)
# =================================
# This is the bactracker. This is crucial !
# We go all the way up until we find
# the raw content we are looking for,
# or comments (because they could hide tags).
# Then we backtrack from there to
# find the closest inner open/close tags
# that contain our content.
# Tracker1 - Formerly captured, was the replacements
(?:
(?&Comment)?
(?!
(?&RawContent)
| (?&Comment)
)
.
)*
# Prevent Tracker1 need to write back
\K
# Conditional Assertion -
# Have we reached the end of string without
# finding the tagged Content ?
(?(?= \z )
# ---------------------------------------------
# Yes - Don't do anything, the remainder is in
# Tracker1 and is thrown away.
# ---------------------------------------------
|
# ---------------------------------------------
# No - Find the tagged Content.
# If no match, Tracker1 will backtrack 1 char and retry.
# Here, Tracker1 will find up to the point
# of the tagged Content and be consumed, but thrown away.
# ---------------------------------------------
# Get Target Open tag
(?<OpenTag> # (1)
(?>
<
(?:
(?<TagName> [\w:]+ ) # (2), tag name
(?: " .*? " | ' .*? ' | [^>]*? )+
)
>
)
(?<! /> )
)
# Get Body containing the raw content
(?<Body> # (3)
# Stuff before raw content
(?&Char_Not_Tag)*?
(?:
(?&Tag_Not_TargetOpen)
(?&Char_Not_Tag)*?
)*?
# The raw content we need
(?= . )
(?&RawContent)
# Stuff after raw content
(?&Char_Not_Tag)*?
(?:
(?&Tag_Not_TargetOpen)
(?&Char_Not_Tag)*?
)*?
)
# Get Target Close tag
(?<CloseTag> # (4)
(?>
<
(?: / \2 \s* )
>
)
)
)
|
# Alternative 2 (lowest priority)
# =================================
# Here, we've already backtracked all
# possibilities from Tracker1.
# At this point, we have raw content,
# or comments that we must get past.
# Comments because they could hide tags.
# Just take it off, it will be thrown away.
# Tracker2 - Formerly captured, was the replacements
.*?
(?:
(?&RawContent)
| (?&Comment)
)
# Prevent Tracker2 need to write back
\K
)
# Functions
# -----------------------
(?(DEFINE)
(?<RawContent> # (5)
# Raw content we are looking for.
# Note - this is content and is not contained
# in tags nor comments.
\.\.\.rem # '...rem' or whatever
)
(?<Tag_Not_TargetOpen> # (6)
# Consume any tag that
# is not the target Open tag.
# Comsume comment as well.
(?>
<
(?:
(?! \2 )
[\w:]+
(?: " .*? " | ' .*? ' | [^>]*? )+
)
>
|
(?&Comment)
)
)
(?<Char_Not_Tag> # (7)
# Consume any charater
# that does not begin a tag or comment
(?!
(?>
<
(?:
[\w:]+
(?: " .*? " | ' .*? ' | [^>]*? )+
)
>
)
|
(?&Comment)
)
.
)
(?<Comment> # (8)
# Comment
(?>
<
(?:
!
(?:
(?: DOCTYPE .*? )
| (?: \[CDATA\[ .*? \]\] )
| (?: -- .*? -- )
| (?: ATTLIST .*? )
| (?: ENTITY .*? )
| (?: ELEMENT .*? )
)
)
>
)
)
)
测试用例
输入:
<div>blah blah <i>some text</i> ...rem</div>
<b>SomeText...rem</b>
<u>SomeText...rem</b>
<strong>SomeText...rem</b>
<a href="/">SomeText...rem</a>
<div>SomeText...rem</div>
输出:
** Grp 0 - ( pos 0 , len 44 )
<div>blah blah <i>some text</i> ...rem</div>
** Grp 1 [OpenTag] - ( pos 0 , len 5 )
<div>
** Grp 2 [TagName] - ( pos 1 , len 3 )
div
** Grp 3 [Body] - ( pos 5 , len 33 )
blah blah <i>some text</i> ...rem
** Grp 4 [CloseTag] - ( pos 38 , len 6 )
</div>
---------------------
** Grp 0 - ( pos 46 , len 21 )
<b>SomeText...rem</b>
** Grp 1 [OpenTag] - ( pos 46 , len 3 )
<b>
** Grp 2 [TagName] - ( pos 47 , len 1 )
b
** Grp 3 [Body] - ( pos 49 , len 14 )
SomeText...rem
** Grp 4 [CloseTag] - ( pos 63 , len 4 )
</b>
---------------------
** Grp 0 - ( pos 86 , len 0 ) EMPTY
** Grp 1 [OpenTag] - NULL
** Grp 2 [TagName] - ( pos 70 , len 1 )
u
** Grp 3 [Body] - NULL
** Grp 4 [CloseTag] - NULL
---------------------
** Grp 0 - ( pos 114 , len 0 ) EMPTY
** Grp 1 [OpenTag] - NULL
** Grp 2 [TagName] - ( pos 93 , len 6 )
strong
** Grp 3 [Body] - NULL
** Grp 4 [CloseTag] - NULL
---------------------
** Grp 0 - ( pos 120 , len 30 )
<a href="/">SomeText...rem</a>
** Grp 1 [OpenTag] - ( pos 120 , len 12 )
<a href="/">
** Grp 2 [TagName] - ( pos 121 , len 1 )
a
** Grp 3 [Body] - ( pos 132 , len 14 )
SomeText...rem
** Grp 4 [CloseTag] - ( pos 146 , len 4 )
</a>
---------------------
** Grp 0 - ( pos 152 , len 25 )
<div>SomeText...rem</div>
** Grp 1 [OpenTag] - ( pos 152 , len 5 )
<div>
** Grp 2 [TagName] - ( pos 153 , len 3 )
div
** Grp 3 [Body] - ( pos 157 , len 14 )
SomeText...rem
** Grp 4 [CloseTag] - ( pos 171 , len 6 )
</div>
以前版本跟Tracker回写。
# ** Usage **
# -----------------
# Find: '~(?s)(?:(?<Tracker1>(?:(?&Comment)?(?!(?&RawContent)|(?&Comment)).)*)(?(?=\z)|(?<OpenTag>(?><(?:(?<TagName>[\w:]+)(?:".*?"|\'.*?\'|[^>]*?)+)>)(?<!/>))(?<Body>(?&Char_Not_Tag)*?(?:(?&Tag_Not_TargetOpen)(?&Char_Not_Tag)*?)*?(?=.)(?&RawContent)(?&Char_Not_Tag)*?(?:(?&Tag_Not_TargetOpen)(?&Char_Not_Tag)*?)*?)(?<CloseTag>(?><(?:/\3\s*)>)))|(?<Tracker2>.*?(?:(?&RawContent)|(?&Comment))))(?(DEFINE)(?<RawContent>\.\.\.rem)(?<Tag_Not_TargetOpen>(?><(?:(?!\3)[\w:]+(?:".*?"|\'.*?\'|[^>]*?)+)>|(?&Comment)))(?<Char_Not_Tag>(?!(?><(?:[\w:]+(?:".*?"|\'.*?\'|[^>]*?)+)>)|(?&Comment)).)(?<Comment>(?><(?:!(?:(?:DOCTYPE.*?)|(?:\[CDATA\[.*?\]\])|(?:--.*?--)|(?:ATTLIST.*?)|(?:ENTITY.*?)|(?:ELEMENT.*?)))>)))~'
# Replace: '$1$6'