我有成百上千个包含如下内容的文件,我想摆脱整个块。
<texttool id="468" rect="55,306,319,23">
<toolstroke />
<toolcolor />
<font style="2" size="18" />
<Text>(Be sure to state the page/problem number.)</Text>
</texttool>
问题在于它们都具有不同的id=XXX
部分……其他都一样。
有没有办法进行大规模发现和替换以应对这种情况?
答案 0 :(得分:3)
<texttool\b[^>]*?\brect="55,306,319,23"(?:(?!<texttool\b).)*</texttool>
LEAVE EMPTY
. matches newline
说明:
<texttool\b # open tag
[^>]*? # 0 or more any character that is not >, not greedy
\brect="55,306,319,23" # literally
# Tempered greedy token
(?: # start non capture group
(?!<texttool\b) # negative lookahead, make sure we haven't same tag
. # any character
)* # end group, may appear 0 or more times
</texttool> # end tag
给出:
<grouptool id="881" rect="20,576,456,141">
<imagetool id="882" rect="349.15240478515625,581.5066528320312,111.22747039794922,132.8365936279297">
<toolstroke WIDTH="1.0" CAP="2" JOIN="2" MITER="0.0" />
<bordercolor />
<image name="head-set-md.png" type="CLIPART" size="20419" w="252" h="300" CRC="3224584205" />
</imagetool>
<texttool id="884" rect="30,584,214,31">
<toolstroke />
<toolcolor />
<font style="3" size="24" />
<Text>Got Audio Problems?</Text>
</texttool>
<texttool id="885" rect="55,306,319,23">
<toolstroke />
<toolcolor />
<font style="2" size="18" />
<Text>Note: Audio problems can be caused</Text>
</texttool>
<imagetool id="886" rect="36.17853927612305,631.7913818359375,262.9012756347656,24.34532356262207">
<toolstroke WIDTH="1.0" CAP="2" JOIN="2" MITER="0.0" />
<bordercolor />
<image name="unknown.png" type="CLIPART" size="1777" w="260" h="24" CRC="2321804736" />
</imagetool>
<texttool id="887" rect="55,306,319,23">
<toolstroke />
<toolcolor />
<font style="2" size="18" />
<Text>by a weak/spotty internet connection.</Text>
</texttool>
<rectangletool id="888" rect="249.5330505371093,627.7338256835938,30.33476448059082,31.446043014526367">
<toolstroke WIDTH="4.0" />
<toolcolor RGB="52224" />
<fillcolor RGB="16777215" ALPHA="0" />
</rectangletool>
</grouptool>
给定示例的结果
<grouptool id="881" rect="20,576,456,141">
<imagetool id="882" rect="349.15240478515625,581.5066528320312,111.22747039794922,132.8365936279297">
<toolstroke WIDTH="1.0" CAP="2" JOIN="2" MITER="0.0" />
<bordercolor />
<image name="head-set-md.png" type="CLIPART" size="20419" w="252" h="300" CRC="3224584205" />
</imagetool>
<rectangletool id="883" rect="20,576,455.0214538574219,141">
<toolstroke />
<toolcolor />
<fillcolor RGB="16777215" ALPHA="0" />
</rectangletool>
<imagetool id="886" rect="36.17853927612305,631.7913818359375,262.9012756347656,24.34532356262207">
<toolstroke WIDTH="1.0" CAP="2" JOIN="2" MITER="0.0" />
<bordercolor />
<image name="unknown.png" type="CLIPART" size="1777" w="260" h="24" CRC="2321804736" />
</imagetool>
<rectangletool id="888" rect="249.5330505371093,627.7338256835938,30.33476448059082,31.446043014526367">
<toolstroke WIDTH="4.0" />
<toolcolor RGB="52224" />
<fillcolor RGB="16777215" ALPHA="0" />
</rectangletool>
</grouptool>
屏幕截图:
答案 1 :(得分:2)
使用以下正则表达式搜索整个文件,并删除所有<texttool>
块以及其中的内容:
(<texttool(?:.|\n)*?<\/texttool>)
之前
text before<texttool id="468" rect="55,306,319,23">
<toolstroke />
<toolcolor />
<font style="2" size="18" />
<Text>(Be sure to state the page/problem number.)</Text>
</texttool> text after
<texttool id="468" rect="55,306,319,23">
<toolstroke />
<toolcolor />
<font style="2" size="18" />
<Text>(Be sure to state the page/problem number.)</Text>
</texttool>
之后
text before text after
您可以在此DEMO
中自己尝试一下根据要求,以下正则表达式将仅删除包含以下属性-<texttool>
的{{1}}:
rect="55,306,319,23"
这是更新的正则表达式DEMO。
请注意,它将仅匹配包含该特定字符串的块,并匹配其文字字符。
我提供的Regex在Notepad ++中无法正常工作,因为它使用了基于定制PCRE的正则表达式系统。这是一种适用于我的经过测试和验证的模式:
(<texttool.*rect=\"55\,306\,319\,23\"(?:.|\n)*?<\/texttool>)
在记事本++搜索窗口中禁用<\btexttool.*\brect\=\"55\,306\,319\,23\"([\s\S]*?)<\/\btexttool>
选项非常重要,否则该模式将不起作用,因为提供的模式与之不兼容。