从文本文件中的一行代码中搜索并打印某些单词

时间:2012-10-05 22:40:21

标签: linux search printing grep command-line-interface

所以在linux的命令行中我试图搜索一些HTML代码并只打印代码的动态部分。例如这段代码

<p><span class="RightSideLinks">Tel: 090 97543</span></p>

我只想打印97543而不是090.下次我搜索文件时代码可能已更改为

<p><span class="RightSideLinks">Tel: 081 82827</span></p>

我只想要82827.剩下的代码保持不变只是电话号码改变了。

我可以使用grep执行此操作吗? 感谢

编辑:

是否可以在此代码上使用它?

<tr class="patFuncEntry"><td align="left" class="patFuncMark"><input type="checkbox" name="renew0" id="renew0" value="i1061700" /></td><td align="left" class="patFuncTitle"><label for="renew0"><a href="/record=p1234567~S0"> I just want to print this part. </a></label>

记录号码有哪些变化:p1234567~S0"以及我要打印的文字。

1 个答案:

答案 0 :(得分:1)

使用GNU grep的一种方式:

grep -oP '(?<=Tel: .{3} )[^<]+' file.txt

file.txt的示例内容:

<p><span class="RightSideLinks">Tel: 090 97543</span></p>
<p><span class="RightSideLinks">Tel: 081 82827</span></p>

结果:

97543
82827

编辑:

(?<=Tel: .{3} ) ## This is a positive lookbehind assertion, which to be
                ## interpreted must be used with grep's Perl regexp flag, '-P'.

Tel: .{3}       ## So this is what we're actually checking for; the phrase 'Tel: '
                ## followed by any character exactly three times followed by a 
                ## space. Since we're searching only for numbers you could write
                ## 'Tel: [0-9]{3} ' instead.

[^<]+           ## Grep's '-o' flag enables us to return exactly what we want, 
                ## rather than the whole line. Therefore this expression will
                ## return any character except '<' any number of times.

Putting it all together, we're asking grep to return any character except '<' 
any number of times if we can find 'Tel: .{3} ' immediately ahead of it. HTH.