我有一个包含许多行的文件,如下所示:
<li><img src="img/tt_potato-30x30.png" alt="ew_inactive"> <img src="img/in-event-40x40.png" alt="event"> - dep[(0:0)(0:0)]ref[(3:0)(0:0)]srch[?] - <a href "tcc_1111.html">XX:The quick brown fox jumped over the lazy </a> -<img src= "img/config-40x40.png" alt="config"><img src="img/validate-40x50.png" alt="validate"> - user
<li><img src="img/tt_potato-30x30.png" alt="ew_inactive"> <img src="img/in-event-40x40.png" alt="event"> - dep[(0:0)(0:0)]ref[(3:0)(0:0)]srch[?] - <a href "tcc_1111.html">YY:Jack and Jill went up the hill </a> -<img src= "img/config-40x40.png" alt="config"><img src="img/validate-40x50.png" alt="validate"> - user
<li><img src="img/tt_potato-30x30.png" alt="ew_inactive"> <img src="img/in-event-40x40.png" alt="event"> - dep[(0:0)(0:0)]ref[(3:0)(0:0)]srch[?] - <a href "tcc_1111.html">ZZ: Mary had a little lamb </a> -<img src= "img/config-40x40.png" alt="config"><img src="img/validate-40x50.png" alt="validate"> - user
我希望提取以下字符串并丢弃其他所有内容。
XX: The quick brown fox jumped over the lazy
YY: Jack and Jill went up the hill
ZZ: Mary had a little lamb
到目前为止,我已尝试使用以下awk命令,但由于XX需要替换为YY和ZZ,因此它似乎有限。
awk '{gsub(/^.*XX:/,"XX:"); gsub(/[<\a>].*$/,"[</a>].");print}'
有没有其他人可以建议使用任何其他标准Linux工具? 感谢。
答案 0 :(得分:1)
如果您的Input_file与显示的示例相同,那么以下内容也可以帮助您。
awk -F"\">|</a>" 'NF{print $4}' Input_file
说明:将">
和</a>
作为字段分隔符(显然要获得OP需要:))。 NF将确保我们应该跳过空行。现在,当我们将字段分隔符设置为2时,我们可以看到第4个字段将是OP所需的字段,这里是我们如何看到所有字段的值,我们可以选择OP需要获得的第4列。 / p>
awk -F"\">|</a>" '{for(i=1;i<=NF;i++){print i,$i}}' Input_file
1 <li><img src="img/tt_potato-30x30.png" alt="ew_inactive
2 <img src="img/in-event-40x40.png" alt="event
3 - dep[(0:0)(0:0)]ref[(3:0)(0:0)]srch[?] - <a href "tcc_1111.html
4 XX:The quick brown fox jumped over the lazy
5 -<img src= "img/config-40x40.png" alt="config
6 <img src="img/validate-40x50.png" alt="validate
7 - user
1 <li><img src="img/tt_potato-30x30.png" alt="ew_inactive
2 <img src="img/in-event-40x40.png" alt="event
3 - dep[(0:0)(0:0)]ref[(3:0)(0:0)]srch[?] - <a href "tcc_1111.html
4 YY:Jack and Jill went up the hill
5 -<img src= "img/config-40x40.png" alt="config
6 <img src="img/validate-40x50.png" alt="validate
7 - user
1 <li><img src="img/tt_potato-30x30.png" alt="ew_inactive
2 <img src="img/in-event-40x40.png" alt="event
3 - dep[(0:0)(0:0)]ref[(3:0)(0:0)]srch[?] - <a href "tcc_1111.html
4 ZZ: Mary had a little lamb
5 -<img src= "img/config-40x40.png" alt="config
6 <img src="img/validate-40x50.png" alt="validate
7 - user
我希望这会有所帮助。
答案 1 :(得分:0)
我想,这个perl单线程会做(看起来你在linux上):
perl -lne 'print $1 if m{>((XX|YY|ZZ):[^<]*)}'
答案 2 :(得分:0)
^.XX
表示any character followed by XX at the start of a line
- 它与XX
中线不匹配。 [<\a>]
表示any of the characters <, \, a, or >
- 它与字符串<\a>
不匹配。找一个正则表达式教程......
你的问题不明确但也许这就是你想要做的事情?
$ awk '{sub(/<\/a>.*/,""); sub(/.*>/,"")} NF' file
XX:The quick brown fox jumped over the lazy
YY:Jack and Jill went up the hill
ZZ: Mary had a little lamb
或者使用GNU awk为第3个arg匹配()以打印...之间的任何内容...(假设每行一个):
$ awk 'match($0,/.*<a[^>]*>(.*)<\/a>.*/,a){print a[1]}' file
XX:The quick brown fox jumped over the lazy
YY:Jack and Jill went up the hill
ZZ: Mary had a little lamb
在任何sed中都是:
$ sed -n 's/.*<a[^>]*>\(.*\)<\/a>.*/\1/p' file
XX:The quick brown fox jumped over the lazy
YY:Jack and Jill went up the hill
ZZ: Mary had a little lamb