好的,我有一些文字:
=== Blah 1 ===
::Junk I wish: 2 Ignore <br/>
::More Junk: 1.2-2.7 <br/>
::ABC: [http://www.google.com (STUFF/I/Want)]<br/>
::More2: Ignore<br/>
::More Stuf 2 Ignore: N/A<br/>
=== Blah 2 ===
::Junk I wish: More 2 Ignore <br/>
::More Junk: 1.2-2.7 <br/>
::ABC: [http://www.google.com (Other/STUFF/I/Want)]<br/>
::More2: More Ignore<br/>
::More Stuf 2 Ignore: More N/A<br/>
我想输出:
Blah 1, (STUFF/I/Want)
Blah 2, (Other/STUFF/I/Want)
我已经想出如何抓住我想要的部分线条:
gawk '/===/ {print } /ABC/ {print $3}' file_name
这输出以下内容:
=== Blah 1 ===
(STUFF/I/Want)]<br/>
=== Blah 2 ===
(Other/STUFF/I/Want)]<br/>
我不明白的是如何去掉我不想要的其他角色,并把它放在一行。
答案 0 :(得分:4)
使用printf
代替print
省略换行符,只打印第一个块中的第二个和第三个字段,并使用sub
丢弃您不想要的内容在第二块:
awk '/===/{printf "%s %s, ",$2,$3}/ABC/{sub(/].*/,"");print $3}' file
Blah 1, (STUFF/I/Want)
Blah 2, (Other/STUFF/I/Want)
如果title是可变长度:
awk '/===/{gsub(/ ?=+ ?/,"");printf "%s, ",$0}/ABC/{sub(/].*/,"");print $3}' file
Blah 1, (STUFF/I/Want)
Blah 2, (Other/STUFF/I/Want)
答案 1 :(得分:3)
单向。
script.awk
的内容:
BEGIN {
## Characters to separate output fields
OFS = ", "
}
## When line begins with several equal signs, remove them, both leading
## and trailing, and save the title.
$1 ~ /^=+$/ {
gsub( /\s*=\s*/, "", $0 )
title = $0
next
}
## For the second field, split line with both pair of parentheses and
## print second field.
$1 ~ /ABC/ {
## For GNU-Awk
#split( $0, abc_line, /(\()|(\))/, seps )
#printf "%s%s%s%s%s\n", title, OFS, seps[1], abc_line[2], seps[2]
## For Awk
split( $0, abc_line, /(\()|(\))/ )
printf "%s%s(%s)\n", title, OFS, abc_line[2]
}
像以下一样运行:
awk -f script.awk infile
它产生了:
Blah 1, (STUFF/I/Want)
Blah 2, (Other/STUFF/I/Want)
答案 2 :(得分:1)
gawk '/===/{header=gensub(" *=== *","","g",$0)} /ABC/{abc=gensub("]<br/>","","g",$3); print header", "abc}' file_name
这可能适合你。它将剥离的信息保存到变量中,然后打印出来。
答案 3 :(得分:0)
有时在awk中,如果你寻找一个非正统的记录分隔符,解决方案变得非常简单:
awk -v RS=' *=== *|[()]' '
NR%4==2 {printf "%s, ", $0}
NR%4==0 {print "(" $0 ")"}
'
此处,记录分隔符===
可选地由空格或左括号或右括号括起。