Question

我有成百上千个包含如下内容的文件，我想摆脱整个块。

   <texttool id="468" rect="55,306,319,23">
      <toolstroke />
      <toolcolor />
      <font style="2" size="18" />
      <Text>(Be sure to state the page/problem number.)</Text>
    </texttool>

问题在于它们都具有不同的id=XXX部分……其他都一样。

有没有办法进行大规模发现和替换以应对这种情况？

Answer 1

Ctrl + H
查找内容：<texttool\b[^>]*?\brect="55,306,319,23"(?:(?!<texttool\b).)*</texttool>
替换为：LEAVE EMPTY
检查匹配大小写
检查环绕
检查正则表达式
检查. matches newline
全部替换

说明：

<texttool\b             # open tag
[^>]*?                  # 0 or more any character that is not >, not greedy
\brect="55,306,319,23"  # literally
                    # Tempered greedy token
(?:                     # start non capture group
 (?!<texttool\b)        # negative lookahead, make sure we haven't same tag
 .                      # any character
)*                      # end group, may appear 0 or more times
</texttool>             # end tag

给出：

  <grouptool id="881" rect="20,576,456,141">
    <imagetool id="882" rect="349.15240478515625,581.5066528320312,111.22747039794922,132.8365936279297">
      <toolstroke WIDTH="1.0" CAP="2" JOIN="2" MITER="0.0" />
      <bordercolor />
      <image name="head-set-md.png" type="CLIPART" size="20419" w="252" h="300" CRC="3224584205" />
    </imagetool>
    <texttool id="884" rect="30,584,214,31">
      <toolstroke />
      <toolcolor />
      <font style="3" size="24" />
      <Text>Got Audio Problems?</Text>
    </texttool>
    <texttool id="885" rect="55,306,319,23">
      <toolstroke />
      <toolcolor />
      <font style="2" size="18" />
      <Text>Note: Audio problems can be caused</Text>
    </texttool>
    <imagetool id="886" rect="36.17853927612305,631.7913818359375,262.9012756347656,24.34532356262207">
      <toolstroke WIDTH="1.0" CAP="2" JOIN="2" MITER="0.0" />
      <bordercolor />
      <image name="unknown.png" type="CLIPART" size="1777" w="260" h="24" CRC="2321804736" />
    </imagetool>
    <texttool id="887" rect="55,306,319,23">
      <toolstroke />
      <toolcolor />
      <font style="2" size="18" />
      <Text>by a weak/spotty internet connection.</Text>
    </texttool>
    <rectangletool id="888" rect="249.5330505371093,627.7338256835938,30.33476448059082,31.446043014526367">
      <toolstroke WIDTH="4.0" />
      <toolcolor RGB="52224" />
      <fillcolor RGB="16777215" ALPHA="0" />
    </rectangletool>
  </grouptool>

给定示例的结果

  <grouptool id="881" rect="20,576,456,141">
    <imagetool id="882" rect="349.15240478515625,581.5066528320312,111.22747039794922,132.8365936279297">
      <toolstroke WIDTH="1.0" CAP="2" JOIN="2" MITER="0.0" />
      <bordercolor />
      <image name="head-set-md.png" type="CLIPART" size="20419" w="252" h="300" CRC="3224584205" />
    </imagetool>
    <rectangletool id="883" rect="20,576,455.0214538574219,141">
      <toolstroke />
      <toolcolor />
      <fillcolor RGB="16777215" ALPHA="0" />
    </rectangletool>


    <imagetool id="886" rect="36.17853927612305,631.7913818359375,262.9012756347656,24.34532356262207">
      <toolstroke WIDTH="1.0" CAP="2" JOIN="2" MITER="0.0" />
      <bordercolor />
      <image name="unknown.png" type="CLIPART" size="1777" w="260" h="24" CRC="2321804736" />
    </imagetool>

    <rectangletool id="888" rect="249.5330505371093,627.7338256835938,30.33476448059082,31.446043014526367">
      <toolstroke WIDTH="4.0" />
      <toolcolor RGB="52224" />
      <fillcolor RGB="16777215" ALPHA="0" />
    </rectangletool>
  </grouptool>

屏幕截图：

Answer 2

使用以下正则表达式搜索整个文件，并删除所有<texttool>块以及其中的内容：

(<texttool(?:.|\n)*?<\/texttool>)

之前

 text before<texttool id="468" rect="55,306,319,23">
      <toolstroke />
      <toolcolor />
      <font style="2" size="18" />
      <Text>(Be sure to state the page/problem number.)</Text>
    </texttool> text after

<texttool id="468" rect="55,306,319,23">
      <toolstroke />
      <toolcolor />
      <font style="2" size="18" />
      <Text>(Be sure to state the page/problem number.)</Text>
    </texttool>

之后

 text before text after

您可以在此DEMO

中自己尝试一下

更新1

根据要求，以下正则表达式将仅删除包含以下属性-<texttool>的{{1}}：

rect="55,306,319,23"

这是更新的正则表达式DEMO。

请注意，它将仅匹配包含该特定字符串的块，并匹配其文字字符。

更新2

我提供的Regex在Notepad ++中无法正常工作，因为它使用了基于定制PCRE的正则表达式系统。这是一种适用于我的经过测试和验证的模式：

(<texttool.*rect=\"55\,306\,319\,23\"(?:.|\n)*?<\/texttool>)

在记事本++搜索窗口中禁用<\btexttool.*\brect\=\"55\,306\,319\,23\"([\s\S]*?)<\/\btexttool>选项非常重要，否则该模式将不起作用，因为提供的模式与之不兼容。

在记事本++中进行复杂的查找和替换

2 个答案:

更新1

更新2