我有一个HtmlString,其中some标签有多个“href”属性。我必须删除其中一个。如果href属性大于1则必须通过regex删除空白href属性。
<p>
Contrary to popular belief, Lorem Ipsum is not simply random text.It has
<a title="Test PDF for RTF" href="" title="Test PDF for RTF" href="Test%20PDF%20for%20rtf.pdf">
Test PDF
</a>
roots in a piece of classical Latin literature from 45 BC, making
<a title="Learn More" href="test.html" title="Learn More" >
Learn More
</a>
it over 2000 years old. Richard McClintock,
<a title="Test Page" href="" >
Test Page
</a>
Latin professor at Hampden-Sydney College in Virginia,
<a title="Test PDF for RTF" href="" title="Test PDF for RTF" href="Test%20PDF%20for%20rtf.pdf">
Test PDF
</a>
looked up one of the more obscure Latin words, consectetur
</p>
答案 0 :(得分:1)
我认为你想要:当它有两个herf
并且与你的评论一样时,首先在一行或文本中匹配href
:我必须保留一个href,不要&#t; t如果它是空的。您想要删除重复 href
,如果是,您可以申请:
(?=href.+?href)[^"]+""
这一部分:(?=href.+?href)
是一个先行断言,如果它找到两次,就会在第一个href
之前匹配一个零长度,而这个部分:[^"]+""
匹配那个空href=""
}}
(?=href.+?href)[^"]+""
您在文件中输入的最佳测试:
perl -lne 'print $& while/(?=href.+?href)[^"]+""/g' file
输出:
href=""
href=""
并删除:
perl -lpe 's/(?=href.+?href)[^"]+""/==>Removed<==/g' file
它输出:
<p>
Contrary to popular belief, Lorem Ipsum is not simply random text.It has
<a title="Test PDF for RTF" ==>Removed<== title="Test PDF for RTF" href="Test%20PDF%20for%20rtf.pdf">
Test PDF
</a>
roots in a piece of classical Latin literature from 45 BC, making
<a title="Learn More" href="test.html" title="Learn More" >
Learn More
</a>
it over 2000 years old. Richard McClintock,
<a title="Test Page" href="" >
Test Page
</a>
Latin professor at Hampden-Sydney College in Virginia,
<a title="Test PDF for RTF" ==>Removed<== title="Test PDF for RTF" href="Test%20PDF%20for%20rtf.pdf">
Test PDF
</a>
looked up one of the more obscure Latin words, consectetur
</p>
此外,您可以将此模式应用于java,并将替换设置为""
空
答案 1 :(得分:0)
A possible solution(删除重复项,href =空白):
(\w+=".*?")(?=[^>]+\1)|href="" //replace with nothing
假设>
尚未发生,意味着我们处于相同的标签中,这可能是天真的,但可能足够安全。
答案 2 :(得分:0)
如果只有一个空的href,没有href就离开了:
/\s?href=\"\/?\"/
将匹配所有空白hrefs
哟没有指定您使用正则表达式的语言,因此可能需要稍微调整一下。