我们如何在XML文件的标记中提取值?

时间:2014-08-05 08:23:22

标签: xml unix

我想阅读weblogic.xml并提取上下文根信息。这是一个例子:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE weblogic-web-app PUBLIC "-//BEA Systems, Inc.//DTD Web Application 8.1//EN" "http://www.bea.com/servers/wls810/dtd/weblogic810-web-jar.dtd">
 <weblogic-web-app>
   <context-root>
    /XYZ
   </context-root>
 </weblogic-web-app>

我尝试过以下命令

sed -n '/context-root/{s/.*<context-root>//;s/<\/context-root.*//;p;}' weblogic.xml

awk -F "[><]" '/context-root/{print $3}' weblogic.xml

perl -ne 'if (/context-root/){ s/.*?>//; s/<.*//;print;}' weblogic.xml

如果标签是这样的话,它工作正常:

<context-root>/XYZ</context-root>

如何从xml以上提取标签的值?

1 个答案:

答案 0 :(得分:0)

awk '{ gsub(/^[ \t]+|[ \t\r]+$/, ""); } /<\/context-root>/ { p = 0 }; p; /<context-root>/ { p = 1 }' file

输出:

/XYZ

更新

#!/usr/bin/awk -f
{
    gsub(/^[ \t]+|[ \t\r]+$/, "")
}
match($0, /^[^<]*<\/context-root>/) {
    if (p) {
        t = substr($0, 1, index($0, "</context-root>") - 1)
        if (length(t)) print t
    }
    $0 = substr($0, RSTART + RLENGTH)
    p = 0
}
{
    while (match($0, /<context-root>[^<]*<\/context-root>/)) {
        t = substr($0, RSTART, RLENGTH)
        gsub(/<\/?context-root>/, "", t)
        print t
        $0 = substr($0, RSTART + RLENGTH)
    } 
}
p
match($0, /<context-root>/) {
    t = substr($0, RSTART + RLENGTH)
    if (length(t)) print t
    p = 1
}

另一个版本:

#!/usr/bin/awk -f
function strip(t) {
    gsub(/^[ \t]+|[ \t\r]+$/, "", t)
    return t
}
match($0, /^[^<]*<\/context-root>/) {
    if (p) {
        t = strip(substr($0, 1, index($0, "</context-root>") - 1))
        if (length(t)) print t
    }
    $0 = substr($0, RSTART + RLENGTH)
    p = 0
}
{
    while (match($0, /<context-root>[^<]*<\/context-root>/)) {
        t = substr($0, RSTART, RLENGTH)
        gsub(/<\/?context-root>/, "", t)
        if (length(t)) print t
        $0 = substr($0, RSTART + RLENGTH)
    } 
}
p {
    print strip($0)
}
match($0, /<context-root>/) {
    t = strip(substr($0, RSTART + RLENGTH))
    if (length(t)) print t
    p = 1
}

输入:

    <context-root>
        A B
    </context-root>
    <context-root>C D</context-root><context-root>E F</context-root><context-root>G H
    I J</context-root>

输出:

A B
C D
E F
G H
I J