awk命令不能正常工作,输出错误,sed命令?

时间:2017-06-05 06:32:01

标签: linux bash awk sed

我在我的数据集上尝试了这个小脚本,由于某些原因我得不到所需的输出?有人可以看看吗?也许你能搞清楚吗?此外,如果您可以提供SED命令解决方案。

脚本

awk -v RS= -F '<connection name="|<hostPort>' '
{
sub(/".*/, "", $2)
split($3, tokens, /[:<]/)
printf "%-6s %s %s\n", $2, tokens[1], tokens[2]
}
'

输入

<hostPort>srv1:33333</hostPort>
<hostPort>srv2:33333</hostPort>
<connection name="boing_ny__Primary__" transport="tcp">
<hostPort>srv1:33333</hostPort>
<connection name="boing_ny__Backup__" transport="tcp">
<hostPort>srv2:33333</hostPort>
<connection name="boy_ny__Primary__" transport="tcp">
<hostPort>srv1:6666</hostPort>
<connection name="boy_ny__Backup__" transport="tcp">
<hostPort>srv2:6666</hostPort>
<connection name="song_ny__Primary__" transport="tcp">
<hostPort>srv1:55555</hostPort>
<connection name="song_ny__Backup__" transport="tcp">
<hostPort>srv2:55555</hostPort>
<connection name="bob_ny__Primary__" transport="tcp">
<hostPort>srv3:33333</hostPort>
<connection name="bob_ny__Backup__" transport="tcp">
<hostPort>srv4:33333</hostPort>
<hostPort>srv1:4444</hostPort>
<hostPort>srv2:4444</hostPort>
<hostPort>srv1:4444</hostPort>

当前输出

srv1:33333</hostPort>
srv2 33333

期望的输出

boing_ny__Primary__  srv1 33333
boing_ny__Backup__   srv2 33333
boy_ny__Primary__     srv1 6666
boy_ny__Backup__   srv2 6666
song_ny__Primary__ srv1 55555
song_ny__Backup__ srv2 55555
bob_ny__Primary__ srv3 33333
bob_ny__Backup__ srv4 33333

4 个答案:

答案 0 :(得分:2)

尝试:

awk '/connection/{match($0,/"[^"]*/);VAL=substr($0,RSTART+1,RLENGTH-1);next} /hostPort/ && VAL{match($0,/>.*</);print VAL FS substr($0,RSTART+1,RLENGTH-2)}'   Input_file

将很快添加说明。

EDIT2:以下是相同的解释。

awk '/connection/{                                                    #### Looking for a line which has string connection in it.
                        match($0,/"[^"]*/);                           #### Using match function here to match a regex where it starts from " and looks for first occurrence of ".
                        VAL=substr($0,RSTART+1,RLENGTH-1);            #### Now creating a variable named VAL whose value is substring of RSTART and LENGTH, where RLENGTH and RSTART are the default keywords of awk and they will be SET when a REGEX match is found. RSTART will give the index of starting point of match and RLENGTH will give the length of that regex match.
                        next                                          #### Using next keyword to skip all further statements.                               
                 }
    /hostPort/ && VAL{                                                #### Checking here 2 conditions, it checks for a line which has hostport string and value of variable VAL is NOT NULL, if these conditions are TRUE then perform following actions.
                        match($0,/>.*</);                             #### using match function of awk to get the srv values so putting here regex so match from >.*< get everything between > to <.
                        print VAL FS substr($0,RSTART+1,RLENGTH-2)    #### printing value of VAL(which got created in previous condition) then printing the substring of RSTART and RLENGTH values here.
                     }
    '  Input_file                                                     #### Mentioning the Input_file here.

答案 1 :(得分:1)

正如评论所述,正确的方法是使用适当的解析器。

至于实验,这个GNU awk似乎可以完成所提供的输入数据,但无法保证强大的解决方案,因为XML数据可能会因文件而异。

awk '/connection name=/{a=$0;getline; \
print gensub(/(.*connection name=["])(.[^"]*)(["].*)/,"\\2","g",a), \
gensub(/(<.*>)(.[^:]*)([:])(.[^<]*)(<[/].*>)/,"\\2 \\4","g",$0)}' file1

#Output:
boing_ny__Primary__ srv1 33333
boing_ny__Backup__ srv2 33333
boy_ny__Primary__ srv1 6666
boy_ny__Backup__ srv2 6666
song_ny__Primary__ srv1 55555
song_ny__Backup__ srv2 55555
bob_ny__Primary__ srv3 33333
bob_ny__Backup__ srv4 33333 

这是如何工作的:
当我们找到包含/connection name=/的记录时,我们会将此记录$0存储到变量a,我们会使用getline获得下一行,然后我们使用并打印两个类似替换的sed使用gensub

gensub(/(.*connection name=["])(.[^"]*)(["].*)/,"\\2","g",a)
#all chars up to first " --|       |      |       |    |  |
#after " and up to the next "------|      |       |    |  |
#after last " up to the end of $0 --------|       |    |  |
#replace with group 2 ----------------------------|    |  |
#global replacement------------------------------------|  |
#target = a = previous record-----------------------------|

#With a = <connection name="boing_ny__Primary__" transport="tcp">
#Above gensub will return group2 = boing_ny__Primary__


gensub              (/(<.*>)(.[^:]*)([:])(.[^<]*)(<[/].*>)/,"\\2 \\4","g",$0)
#all chars between < >--|       |     |     |        |          |      |  |
#all chars up to : -------------|     |     |        |          |      |  |
#literal : ---------------------------|     |        |          |      |  |
#the part after : and before < -------------|        |          |      |  |
#the last < > part ----------------------------------|          |      |  |
#use group 2 and 4 ---------------------------------------------|      |  |
#global replacement ---------------------------------------------------|  |
#target = $0 current record ----------------------------------------------|

#With $0 = <hostPort>srv2:33333</hostPort>
#Above gensub will return group 2 = srv2 and group 4 = 33333 --> srv2 33333

一般awk gensub synthax为gensub(regexp, replacement, how [, target]),替换部分返回/应用于gensub函数 - see man page of gensub.

答案 2 :(得分:0)

健壮: cat input | awk -F'\"|>' '{print $2}' | awk -F'<' '{print $1}' | sed -z 's/_\n/_ /g' | grep -v ^srv | tr ":" " "

答案 3 :(得分:0)

$ awk -F'[":<>]' '/hostPort/{if (n!="") print n, $3, $4; n=""; next} {n=$3}' file
boing_ny__Primary__ srv1 33333
boing_ny__Backup__ srv2 33333
boy_ny__Primary__ srv1 6666
boy_ny__Backup__ srv2 6666
song_ny__Primary__ srv1 55555
song_ny__Backup__ srv2 55555
bob_ny__Primary__ srv3 33333
bob_ny__Backup__ srv4 33333