基于搜索父叶字符串,从xml查找并打印特定字符串

时间:2016-12-19 16:04:14

标签: xml linux bash awk sed

我需要根据CentOS服务器上的收集器名称从xml文件中提取所有FCP名称 CPM标记内的行数未知 优先使用bash,但任何解决方案都可以。

示例:输入文件

  <CPM display_name="XYZ"  collector="202a" >
    <FCP name="a1" second_name="b2"/>
    <FCP name="a3" second_name="b232323"/>
    <FCP name="a2" second_name="b445"/>
  </CMP>
  <CPM display_name="XYZ"  collector="204a" >
    <FCP name="z1" second_name="b232323232"/>
    <FCP name="s3" second_name="b23232323"/>
    <FCP name="t2" second_name="b4453223"/>
  </CMP>
  <CPM display_name="XYZ"  collector="202a" >
    <FCP name="a11" second_name="basdasdasdasd2"/>
  </CMP>

.... 长文件超过500行。

预期输出

collector="202a"
name="a1"
name="a2"
name="a3"
name="a11"

collector="204a"
name="z1"
name="s3"
name="t2

感谢任何帮助。

1 个答案:

答案 0 :(得分:2)

gawk解决方案,matchsubstrRSTARTRLENGTH可以模拟grep -o行为,A[length(A)+1]=N模拟数组推送行为:

awk '
    match($0, /collector="[^"]*"/){
        collector=substr($0, RSTART, RLENGTH)
    }
    match($0,/[ ]name="[^"]*"/) {
        d[collector][length(d[collector])+1]=substr($0, RSTART+1, RLENGTH)
    }
    END{
        for(k in d){ 
            print(k)
            for (i in d[k]) print d[k][i]
            print ""
        }
    }' file

编辑:谢谢,Ed Morton

awk '
    match($0, /\<collector="[^"]*"/, a){ collector=a[0] } 
    match($0, /\<name="[^"]*"/, a){ d[collector][length(d[collector])+1]=a[0] }
    END{
        for(k in d){ 
            print(k)
            for (i in d[k]) print d[k][i]
            print ""
        }
    }' file

你明白了,

collector="202a"
name="a1" 
name="a3" 
name="a2" 
name="a11" 

collector="204a"
name="z1" 
name="s3" 
name="t2" 

奖励:非gawk解决方案,sedgrepsorttr函数

grep -oE '\b(collector|name)="[^"]*"' file | 
sed ':a;N;$!ba;s/\nname/ name/g' | 
sort -k1 | 
sed ':a;$!N;/^\([^ ]*[ ]\).*\n\1/s/\n/ /;ta;P;D' | 
sed 's/[ ]collector="[^"]*"//g' | 
tr ' ' '\n'

你明白了,

collector="202a"
name="a11"
name="a1"
name="a3"
name="a2"
collector="204a"
name="z1"
name="s3"
name="t2"