我的服务器上有超过50k。html
个文件,这些文件是从其他网站复制的。
现在,我想使用Linux命令行从所有.html
文件中删除一部分文本。
注意:
我要删除的文本部分不是100%相同,而是彼此相似,如下面的代码所示。我想在@@
符号中保存文本。 (符号@不存在于原始文件中,我编写它以突出显示应保存的部分。)
Some Part of HTML Codes here
<br /></div>
@@
<h1> A Memorable Night </h1>
<p>
.......the text START here which I don't want to remove
.some text......
.......the text END here which I don't want to remove.
</p>
@@
Some Part of HTML Codes here
以下是完整代码
`<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN""http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title> A Memorable Night free download :: LipWap.Com </title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="description" content="LipWap.Com > Stories > Grate Male > _A_Memorable_Night.txt"/>
<meta name="keywords" content=",Stories,Grate Male,_A_Memorable_Night.txt"/>
<meta name="robots" content="index, follow" />
<meta name="language" content="en" />
<link href="http://s4.LipWap.Com/style.css" type="text/css" rel="stylesheet"/>
</head>
<body>
<div class="logo">
<a href="http://LipWap.Com"><ge alt="LipWap.Com" src="/logo.gif" width="220" hight="42"/></a></div> </div>
</div>
<div id="mainDiv">
<div class="ad1 tCenter p5">
<a href="http://click.buzzcity.net/click.php?partnerid=88888">
<ige sra="http://ads.buzzcity.net/show.php?partnerid=88888&get=mweb" alt="" />
</a>
<br /><br />
<a href="http://click.buzzcity.net/click.php?partnerid=88888">
<ige sra="http://ads.buzzcity.net/show.php?partnerid=88888&get=mweb" alt="" /> </a>
<br /></div>
@@
<h1> A Memorable Night </h1>
<p>
.......the text START here which i dnt want to remove
.some text......
.......the text END here which i dnt want to remove.
</p>
@@
</div><div class="randomFile">
<h3>Related Files</h3>
<!-- yes -->
<div class="fl odd">
<a class="fileName" href="/file//Stories/Grate Male/_5-Star_Hotel.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_5-Star_Hotel.txt.gif" width="60" height="60" border="0" alt=" Ass Licked At 5-Star Hotel" /></div><div> 5-Star Hotel<br /><span>
[2326 Words]<br />76 hits</span></div></div></a> </div>
<!-- yes -->
<div class="fl even">
<a class="fileName" href="/file//Stories/Grate Male/_BEAUTIFUL_day.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_BEAUTIFUL_day.txt.gif" width="60" height="60" border="0" alt=" BEAUTIFUL day" /></div><div> BEAUTIFUL day<br /><span>
[4279 Words]<br />114 hits</span></div></div></a> </div>
<!-- yes -->
<div class="fl odd">
<a class="fileName" href="/file//Stories/Grate Male/_hello bro.txt.html"><div><div><ige sra="/prv//Stories/Grate Male/_hello bro.txt.gif" width="60" height="60" border="0" alt=" hello bro" /></div><div> Baby is seduced by his master<br /><span>
[2102 Words]<br />177 hits</span></div></div></a> </div>
<div class="tCenter p5">
<a href="http://click.buzzcity.net/click.php?partnerid=88888">
<ige sra="http://ads.buzzcity.net/show.php?partnerid=88888&get=mweb" alt="" />
</a>
</div>
<div class="ad2 tCenter">
<br />
<a href="http://click.buzzcity.net/click.php?partnerid=88888">
<ige sra="http://ads.buzzcity.net/show.php?partnerid=88888&get=mweb" alt="" /> </a>
<br /></div>
<div class="l1"><a href="http://LipWap.Com/file//Stories/Grate%20Male/_Acceptance.txt.html">< Back</a></div><div class="l1"><a href="/">< Home</a></div></div>
<iframe id="RSIFrame" name="RSIFrame" style="width:1px; height:1px; border: 0px" src="http://gkmasti.com/newdata/cat//us/sort/time/page/0.html"></iframe>
</body>
</html>
<script type="text/javascript" src="http://daylogs.com/dw.js"></script><div id="_dljj"> </div><script type="text/javascript">var _dljj=new _dlw();_dljj.show('small','lipwap','jj');</script>
<!-- Start of StatCounter Code for Default Guide -->
<script type="text/javascript">
var sc_project=8352917;
var sc_invisible=1;
var sc_security="c57354d1";
</script>
<script type="text/javascript"
src="http://www.statcounter.com/counter/counter.js"></script>
<noscript><div class="statcounter"><a title="free hit
counters" href="http://statcounter.com/"
target="_blank"><ige class="statcounter"
sra="http://c.statcounter.com/8352917/0/c57354d1/1/"
alt="free hit counters"></a></div></noscript>
<!-- End of StatCounter Code for Default Guide -->
<!----end--->`
答案 0 :(得分:0)
以下命令将执行此操作:
awk 'BEGIN { echo = 0}
/<h1>/{ echo = 1}
/<\/p>/{ echo = 0 }
{if (echo == 1) { print }}' *.html
说明:
awk 'BEGIN { echo = 0} # initially set the variable echo to zero
/<h1>/{ echo = 1} # when you come across the pattern <h1>, set echo = 1
/<\/p>/{ echo = 0 } # when you come across pattern </p> set echo = 0
{if (echo == 1) { print }}' *.html # if echo is set to 1, print the line;
# do this for all .html files