awk显示两场比赛之间的界限

时间:2015-01-21 05:26:28

标签: regex xml awk sed grep

我想从<div class="AA"> <div class="clear"></div>之间的文件中提取行。

欢迎使用sedgrep的正则表达式。

更新

这是我庞大的XML文件的一部分:

RUBBISH
RUBBISH
.
.
.
    <div class="span9">
          <div class="results-count">AAA</div>
    <div class="AA">
      <div class="A"><a href="/TEST">BBB</a>
      </div>
      <div class="BB"><span>CCC</span><br/><a href="/TEST1" class="B">DDD</a>
        <div></div><span>EEE</span><br/><img src="TEST2" title="C"/><a href="/TEST3" class="D">FFF</a>,
    <a href="/TEST4" class="E">GGG</a>
        <div class="clear"></div><a href="/TEST5" class="details">Details</a>
      </div>
      <pre>HHH</pre>
      <div class="clear"></div>
    .
    .
    .
    <div class="span9">
          <div class="results-count">AAA</div>
    <div class="AA">
      <div class="A"><a href="/TEST">BBB</a>
      </div>
      <div class="BB"><span>CCC</span><br/><a href="/TEST1" class="B">DDD</a>
        <div></div><span>EEE</span><br/><img src="TEST2" title="C"/><a href="/TEST3" class="D">FFF</a>,
    <a href="/TEST4" class="E">GGG</a>
        <div class="clear"></div><a href="/TEST5" class="details">Details</a>
      </div>
      <pre>HHH</pre>
      <div class="clear"></div>


RUBBISH
RUBBISH


    <div class="span9">
          <div class="results-count">AAA</div>
    <div class="AA">
      <div class="A"><a href="/TEST">BBB</a>
      </div>
      <div class="BB"><span>CCC</span><br/><a href="/TEST1" class="B">DDD</a>
        <div></div><span>EEE</span><br/><img src="TEST2" title="C"/><a href="/TEST3" class="D">FFF</a>,
    <a href="/TEST4" class="E">GGG</a>
        <div class="clear"></div><a href="/TEST5" class="details">Details</a>
      </div>
      <pre>HHH</pre>
      <div class="clear"></div>
    .
    .
    .

2 个答案:

答案 0 :(得分:2)

awk '/<div class="clear"><\/div>/{p=0} p{print} /<div class="results-count">/{p=1}'

答案 1 :(得分:1)

通过grep,

$ grep -ozP '(?s)(?:\n|^)\s*<div class="results-count">[^\n]*\n\K.*?(?=\n\s*<div class="clear"></div>)' file
<div class="AA">
  <div class="A"><a href="/TEST">BBB</a>
  </div>
  <div class="BB"><span>CCC</span><br/><a href="/TEST1" class="B">DDD</a>
    <div></div><span>EEE</span><br/><img src="TEST2" title="C"/><a href="/TEST3" class="D">FFF</a>,
<a href="/TEST4" class="E">GGG</a>

ReGex DEMO