
时间:2013-07-27 03:57:04

标签: html linux find command

我的服务器上有超过50k。html个文件,这些文件是从其他网站复制的。 现在,我想使用Linux命令行从所有.html文件中删除一部分文本。


我要删除的文本部分不是100%相同,而是彼此相似,如下面的代码所示。我想在@@符号中保存文本。 (符号@不存在于原始文件中,我编写它以突出显示应保存的部分。)

Some Part of HTML Codes here

<br /></div>
<h1> A Memorable Night </h1>
.......the text START here which I don't want to remove
.some text......
.......the text END here which I don't want to remove.
Some Part of HTML Codes here


`<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN""">
<html xmlns="">
<title> A Memorable Night  free download :: LipWap.Com </title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="description" content="LipWap.Com  &gt; Stories &gt; Grate Male &gt; _A_Memorable_Night.txt"/>
<meta name="keywords" content=",Stories,Grate Male,_A_Memorable_Night.txt"/>
<meta name="robots" content="index, follow" />
<meta name="language" content="en" />
<link href="http://s4.LipWap.Com/style.css" type="text/css" rel="stylesheet"/>
<div class="logo">
<a href="http://LipWap.Com"><ge alt="LipWap.Com" src="/logo.gif" width="220" hight="42"/></a></div>      </div>

<div id="mainDiv">
<div class="ad1 tCenter p5">
<a href="">
<ige sra="" alt="" />
<br /><br />
<a href="">
<ige sra="" alt="" />          </a>
<br /></div>

<h1> A Memorable Night </h1>
.......the text START here which i dnt want to remove
.some text......
.......the text END here which i dnt want to remove.
</div><div class="randomFile">
<h3>Related Files</h3>

<!-- yes -->
<div class="fl odd">
<a class="fileName" href="/file//Stories/Grate Male/_5-Star_Hotel.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_5-Star_Hotel.txt.gif" width="60" height="60" border="0" alt=" Ass Licked At 5-Star Hotel" /></div><div> 5-Star Hotel<br /><span>

[2326&nbsp;Words]<br />76 hits</span></div></div></a>  </div>
<!-- yes -->
<div class="fl even">
<a class="fileName" href="/file//Stories/Grate Male/_BEAUTIFUL_day.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_BEAUTIFUL_day.txt.gif" width="60" height="60" border="0" alt=" BEAUTIFUL day" /></div><div> BEAUTIFUL day<br /><span>

[4279&nbsp;Words]<br />114 hits</span></div></div></a>  </div>
<!-- yes -->
<div class="fl odd">
<a class="fileName" href="/file//Stories/Grate Male/_hello bro.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_hello bro.txt.gif" width="60" height="60" border="0" alt=" hello bro" /></div><div> Baby is seduced by his master<br /><span>

[2102&nbsp;Words]<br />177 hits</span></div></div></a>  </div>

<div class="tCenter p5">
<a href="">
<ige sra="" alt="" />
<div class="ad2 tCenter">
<br />
<a href="">
<ige sra="" alt="" />          </a>
<br /></div>

<div class="l1"><a href="http://LipWap.Com/file//Stories/Grate%20Male/_Acceptance.txt.html">&lt; Back</a></div><div class="l1"><a href="/">&lt; Home</a></div></div>
<iframe id="RSIFrame" name="RSIFrame" style="width:1px; height:1px; border: 0px" src=""></iframe>


<script type="text/javascript" src=""></script><div id="_dljj">      </div><script type="text/javascript">var _dljj=new _dlw();'small','lipwap','jj');</script>

<!-- Start of StatCounter Code for Default Guide -->
<script type="text/javascript">
var sc_project=8352917;
var sc_invisible=1;
var sc_security="c57354d1";
<script type="text/javascript"
<noscript><div class="statcounter"><a title="free hit
counters" href=""
target="_blank"><ige class="statcounter"
alt="free hit counters"></a></div></noscript>
<!-- End of StatCounter Code for Default Guide -->

1 个答案:

答案 0 :(得分:0)


awk 'BEGIN { echo = 0}
     /<h1>/{ echo = 1} 
     /<\/p>/{ echo = 0 } 
     {if (echo == 1) { print }}' *.html 


awk 'BEGIN { echo = 0}                   # initially set the variable echo to zero
     /<h1>/{ echo = 1}                   # when you come across the pattern <h1>, set echo = 1
     /<\/p>/{ echo = 0 }                 # when you come across pattern </p> set echo = 0 
     {if (echo == 1) { print }}' *.html  # if echo is set to 1, print the line; 
                                         # do this for all .html files