我需要根据CentOS服务器上的收集器名称从xml文件中提取所有FCP名称 CPM标记内的行数未知 优先使用bash,但任何解决方案都可以。
示例:输入文件
<CPM display_name="XYZ" collector="202a" >
<FCP name="a1" second_name="b2"/>
<FCP name="a3" second_name="b232323"/>
<FCP name="a2" second_name="b445"/>
</CMP>
<CPM display_name="XYZ" collector="204a" >
<FCP name="z1" second_name="b232323232"/>
<FCP name="s3" second_name="b23232323"/>
<FCP name="t2" second_name="b4453223"/>
</CMP>
<CPM display_name="XYZ" collector="202a" >
<FCP name="a11" second_name="basdasdasdasd2"/>
</CMP>
.... 长文件超过500行。
预期输出
collector="202a"
name="a1"
name="a2"
name="a3"
name="a11"
collector="204a"
name="z1"
name="s3"
name="t2
“
感谢任何帮助。
答案 0 :(得分:2)
gawk
解决方案,match
,substr
,RSTART
和RLENGTH
可以模拟grep -o
行为,A[length(A)+1]=N
模拟数组推送行为:
awk '
match($0, /collector="[^"]*"/){
collector=substr($0, RSTART, RLENGTH)
}
match($0,/[ ]name="[^"]*"/) {
d[collector][length(d[collector])+1]=substr($0, RSTART+1, RLENGTH)
}
END{
for(k in d){
print(k)
for (i in d[k]) print d[k][i]
print ""
}
}' file
编辑:谢谢,Ed Morton
awk '
match($0, /\<collector="[^"]*"/, a){ collector=a[0] }
match($0, /\<name="[^"]*"/, a){ d[collector][length(d[collector])+1]=a[0] }
END{
for(k in d){
print(k)
for (i in d[k]) print d[k][i]
print ""
}
}' file
你明白了,
collector="202a"
name="a1"
name="a3"
name="a2"
name="a11"
collector="204a"
name="z1"
name="s3"
name="t2"
奖励:非gawk
解决方案,sed
,grep
,sort
和tr
函数
grep -oE '\b(collector|name)="[^"]*"' file |
sed ':a;N;$!ba;s/\nname/ name/g' |
sort -k1 |
sed ':a;$!N;/^\([^ ]*[ ]\).*\n\1/s/\n/ /;ta;P;D' |
sed 's/[ ]collector="[^"]*"//g' |
tr ' ' '\n'
你明白了,
collector="202a"
name="a11"
name="a1"
name="a3"
name="a2"
collector="204a"
name="z1"
name="s3"
name="t2"