Question

我确信这个例子已经存在，但却找不到它。

在数以千计的静态HTML文件中，我有一段代码如下所示，我需要将具有独特内容的不同AdSense代码块换出来：

<div id="left">

    <div style="margin-top:1px;">
    <script type="text/javascript"><!--
    google_ad_client = "pub-123456132654";
    google_ad_slot = "9844984";
    google_ad_width = 468;
    google_ad_height = 15;
    //-->
    </script>
    <script type="text/javascript"
    src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
    </script>
    </div>
</div>

<div id="content">
<div id="googlesquare">
<script type="text/javascript"><!--
google_ad_client = "pub-123456132654";
google_ad_slot = "68468464";
google_ad_width = 300;
google_ad_height = 250;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</div>

我找到了一些匹配sed使用的开始和结束的模式，但是缺少内部内容匹配。

如果在另一个CLI工具中做得更好，我就不会与sed联系在一起，但是最好使用常见的Unix工具。

更新

以下是我希望能够使用一种模式捕获的内容，而不是抓住其他模式：

enter image description here

Answer 1

正则表达式

<script[^>]*>(.*?)</script>

备注

标签之间的任何内容都存储在第一个捕获组中。
不能正确匹配嵌套在其中的标签。

Answer 2

sed -n '
\|<script type="text/javascript">|,\|</script>| {
    H
    \|</script>| {
       s/.*//
       x
       s/google_ad_client = "pub-123456132654";/&/
       t catch
       b nocatch

: catch
# catch code here
    s/pub-123456132654/nopub-9876543210/
    p
# end of catch block
       b
       }
  }

\|<script type="text/javascript">|,\|</script>| !{
: nocatch
# no catch code here
   p
# end of no catch block
   }
' YourFile

抓住该部分并允许您对其进行操作（此时所有部分都在工作缓冲区中，因此将行分隔为\ n）。出于示例的目的，我只需将pub-123456132654更改为nopub-9876543210，并且不对该文件执行任何其他操作。

找到某个部分时会添加一个新行。如果强制要求，可以将其删除

一些解释

由于＆lt; * / * s，<|> \|用于更改默认分隔符（/），而另一个（|）在这种情况下更有趣p>

\|</script>| {进入\|<script type="text/javascript">|,\|</script>| {块用于在块的最后一行出现，就像$出现在文件的最后一行。

在这个子块中，交换工作和保持缓冲区（目标是将holfing转换为工作缓冲区，并为下一次迭代设置一个空保持缓冲区）

b和t的sed工作流程有点奇怪，因为t（如，如果goto ）只能在//之后工作发生的事情（缺少 else 或不）

基于开始，结束和内容的Sed正则表达式多行匹配

2 个答案:

正则表达式

备注