Question

在下面的xml中使用perl或python（这是最快的）我想要一种方法来获取所有节点/节点名称，其中attribute1设置为＆＃34;字符＆＃34;和attribute2未设置为＆＃34; chr＆＃34;或者没有attribute2本身。请记住，我的xml可以有500个节点，所以请建议更快的方式来获取所有节点

＆＃13;

<NODE attribute1="characters" attribute2="chr" name="node1">
  <content>
    value1
  </content>
</NODE>

<NODE attribute1="camera"  name="node2">
  <content>
    value2
  </content>
</NODE>

<NODE attribute1="camera" attribute2="car" name="node3">
  <content>
    value2
  </content>
</NODE>

＆＃13;

Answer 1

您正在寻找的是xpath表达式：

//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]

使用xmllint进行快速测试：

kent$  cat f.xml
<root>
<NODE attribute1="characters" attribute2="chr" name="node1">
  <content>
    value1
  </content>
</NODE>

<NODE attribute1="camera"  name="node2">
  <content>
    value2
  </content>
</NODE>

<NODE attribute1="camera" attribute2="car" name="node3">
  <content>
    value2
  </content>
</NODE>
</root>

kent$  xmllint --xpath '//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]' f.xml
<NODE attribute1="characters" attribute2="chr" name="node1">
  <content>
    value1
  </content>
</NODE>

更新

如果您只想提取属性name的值，则可以使用此xpath：

// NODE [@ attribute1 =“characters”和（not（@ attribute2）或@ attribute2 =“chr”）] / @ name

或 string（// NODE [@ attribute1 =“characters”和（not（@ attribute2）或@ attribute2 =“chr”）] / @ name）

仍然使用xmllint进行测试：

kent$  xmllint --xpath '//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]/@name' f.xml                                                          
 name="node1"

kent$  xmllint --xpath 'string(//NODE[@attribute1="characters" and ( not(@attribute2) or @attribute2="chr")]/@name)' f.xml
node1

Answer 2

当你将其标记为perl / python时，我将提供一种perlish方法。

Perl有一个很好的库XML::Twig，我非常喜欢解析XML。

#!/usr/bin/perl

use strict;
use warnings;
use XML::Twig;

my $parser = XML::Twig->new();

#would probably use parsefile instead.
#e.g.:
# my $parser = XML::Twig -> new -> parsefile ( 'your_file.xml' );
{
    local $/;
    $parser->parse(<DATA>);
}


#iterate all the elements in the file. 
foreach my $element ( $parser->root()->children() ) {

    #test your conditions
    if ($element->att('attribute1') eq 'characters'
        and ( not defined $element->att('attribute2')
                       or $element->att('attribute2') eq 'chr' )
        )
    {
        #extract name if condition matches
        print $element ->att('name'), "\n";
    }
}


__DATA__
<DATA>
  <NODE attribute1="characters" attribute2="chr" name="node1">
    <content>
      value1
    </content>
  </NODE>

  <NODE attribute1="camera"  name="node2">
    <content>
      value2
    </content>  
  </NODE>

  <NODE attribute1="camera" attribute2="car" name="node3">
    <content>
      value2
    </content>
  </NODE>
</DATA>

Answer 3

使用lxml模块。

content = """
<body>
<NODE attribute1="characters" attribute2="chr" name="node1">
  <content>
    value1
  </content>
</NODE>

<NODE attribute1="camera"  name="node2">
  <content>
    value2
  </content>
</NODE>

<NODE attribute1="camera" attribute2="car" name="node3">
  <content>
    value2
  </content>
</NODE>

<NODE attribute1="characters" attribute2="car" name="node3">
  <content>
    value2
  </content>
</NODE>

<NODE attribute1="characters" name="node3">
  <content>
    value2
  </content>
</NODE>

</body>
"""

from lxml import etree
root = etree.fromstring(content)
l = root.xpath('//*[@attribute1="characters" and ( not(@attribute2) or @attribute2!="chr") ]')
for i in l:
    print i.tag, i.attrib

输出：

$ python test.py 
NODE {'attribute2': 'car', 'attribute1': 'characters', 'name': 'node3'}
NODE {'attribute1': 'characters', 'name': 'node3'}

如何解析xml以获取具有特定属性值的特定节点

3 个答案:

更新