我需要从shell中的xml文件中grep值 下面的示例文件:test.xml
<wtc-import>
<name>WTCImportedService-288-rap04</name>
<resource-name>CAC040F</resource-name>
<local-access-point>lap01</local-access-point>
<remote-access-point-list>rap04</remote-access-point-list>
<remote-name>CAC040F</remote-name>
</wtc-import>
<wtc-import>
<name>WTCImportedService-289-rap04</name>
<resource-name>CAD040F</resource-name>
<local-access-point>lap01</local-access-point>
<remote-access-point-list>rap04</remote-access-point-list>
<remote-name>CAD040F</remote-name>
</wtc-import>
<wtc-import>
<name>WTCImportedService-290-rap04</name>
<resource-name>CAE040F</resource-name>
<local-access-point>lap01</local-access-point>
<remote-access-point-list>rap04</remote-access-point-list>
<remote-name>CAE040F</remote-name>
</wtc-import>
<wtc-import>
<name>WTCImportedService-289-rap04</name>
<resource-name>CAD040F</resource-name>
<local-access-point>lap01</local-access-point>
<remote-access-point-list>rap04</remote-access-point-list>
<remote-name>CAD040F</remote-name>
</wtc-import>
必须grep文件中与之关联的所有值,最后如果存在任何重复的资源名称,则从输出文件中删除重复的
预期输出:
CAC040F
CAD040F
CAE040F
资源CAD040F是重复的,因此在预期输出中它仅出现一次
尝试:
grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}'
这很好用..那之后如何过滤重复项呢?
答案 0 :(得分:1)
您可以使用一个awk命令来完成
awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml
带有示例XML文件
$ awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml
CAC040F
CAD040F
CAE040F
$
答案 1 :(得分:1)
仅速度优化与已经完成工作的@ stack0114106相比
awk -F '[<>]' '$2 == "resource-name" && ! ( $3 in List) { print $3; List[$3] } ' test.xml
答案 2 :(得分:0)
如果您已经获得了输出并且只是想删除重复项,那么最简单的方法是将输出通过管道进行排序,然后传递给uniq,这样您的命令将如下所示
grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' | sort | uniq
答案 3 :(得分:0)
如果选择bash regex,请尝试以下操作:
declare -A name
regex="<remote-name>([^<]+)</remote-name>"
while read -r line; do
if [[ $line =~ $regex ]]; then
name["${BASH_REMATCH[1]}"]=1
fi
done < "test.xml"
for i in "${!name[@]}"; do
echo "$i"
done