Question

我刚刚开始学习Python，必须编写一个程序来解析xml文件。我必须在2个不同的文件中找到一个称为OrganisationReference的标签并将其返回。实际上，有多个使用此名称的Tag，但只有一个（我要返回的标签）具有Tag OrganisationType，其值为DEALER作为父Tag（不确定该术语是否正确）。我试图为此使用ElementTree。这是代码：

    import xml.etree.ElementTree as ET

    tree1 = ET.parse('Master1.xml')
    root1 = tree1.getroot()

    tree2 = ET.parse('Master2.xml')
    root2 = tree2.getroot()

    for OrganisationReference in root1.findall("./Organisation/OrganisationId/[@OrganisationType='DEALER']/OrganisationReference"):
        print(OrganisationReference.attrib)

    for OrganisationReference in root2.findall("./Organisation/OrganisationId/[@OrganisationType='DEALER']/OrganisationReference"):
        print(OrganisationReference.attrib)

但是这什么也不返回（也没有错误）。有人可以帮我吗？

我的文件如下：

  <MessageOrganisationCount>a</MessageOrganisationCount>
  <MessageVehicleCount>x</MessageVehicleCount>
  <MessageCreditLineCount>y</MessageCreditLineCount>
  <MessagePlanCount>z</MessagePlanCount>
  <OrganisationData>
      <Organisation>
          <OrganisationId>
              <OrganisationType>DEALER</OrganisationType>
              <OrganisationReference>WHATINEED</OrganisationReference>
          </OrganisationId>
          <OrganisationName>XYZ.</OrganisationName>
 ....

由于在这个文件中OrganisationReference出现了几次，在开始和结束标签之间有不同的文本，所以我想得到准确的一个，正如您在第9行中看到的：它具有OrganisationId作为父标签，而DEALER还是OrganisationId的子标记。

Answer 1

您与原始尝试非常接近。您只需要对xpath进行一些更改，并对python进行一点更改。

xpath的第一部分以./Organization开头。由于您是从根目录开始执行xpath，因此它期望Organization是孩子。不是;这是一个后裔。

尝试将./Organization更改为.//Organization。（//是/descendant-or-self::node()/的缩写。See here for more info.）

第二个问题是OrganisationId/[@OrganisationType='DEALER']。那是无效的xpath。 /应该从OrganisationId和predicate之间删除。

此外，@是attribute:: axis的缩写语法，OrganisationType是元素而不是属性。

尝试将OrganisationId/[@OrganisationType='DEALER']更改为OrganisationId[OrganisationType='DEALER']。

print(OrganisationReference.attrib)是python问题。 OrganisationReference没有任何属性；只是文字。

尝试将print(OrganisationReference.attrib)更改为print(OrganisationReference.text)。

这里是一个仅使用一个XML文件进行演示的示例...

XML输入（Master1.xml；添加了doc元素以使其格式正确）

<doc>
    <MessageOrganisationCount>a</MessageOrganisationCount>
    <MessageVehicleCount>x</MessageVehicleCount>
    <MessageCreditLineCount>y</MessageCreditLineCount>
    <MessagePlanCount>z</MessagePlanCount>
    <OrganisationData>
        <Organisation>
            <OrganisationId>
                <OrganisationType>DEALER</OrganisationType>
                <OrganisationReference>WHATINEED</OrganisationReference>
            </OrganisationId>
            <OrganisationName>XYZ.</OrganisationName>
        </Organisation>
    </OrganisationData>
</doc>

Python

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')
root1 = tree1.getroot()

for OrganisationReference in root1.findall(".//Organisation/OrganisationId[OrganisationType='DEALER']/OrganisationReference"):
    print(OrganisationReference.text)

打印输出

WHATINEED

还要注意，似乎根本不需要使用getroot()。您可以直接在树上使用findall() ...

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')

for OrganisationReference in tree1.findall(".//Organisation/OrganisationId[OrganisationType='DEALER']/OrganisationReference"):
    print(OrganisationReference.text)

Answer 2

您可以使用嵌套的for循环来执行此操作。首先，您检查OrganisationType的文本是否为DEALER，然后获取所需的OrganisationReference的文本。

如果您想了解有关使用Python解析XML的更多信息，我强烈建议使用XMLtree库的documentation。

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')
root1 = tree1.getroot()

tree2 = ET.parse('Master2.xml')
root2 = tree2.getroot()

#Find the parent Dealer
for element in root1.findall('./Organisation/OrganisationId'):
    if element[0].text == "DEALER":
         print(element[1].text)

如果您OrganisationId中的第一个标签是OrganisationType :)

，则此方法有效

如何使用ElementTree在xml文件中搜索具有特定值的“父”标签的标签？（蟒蛇）

2 个答案:

如何使用ElementTree在xml文件中搜索具有特定值的“父”标签的标签？ （蟒蛇）

2 个答案:

如何使用ElementTree在xml文件中搜索具有特定值的“父”标签的标签？（蟒蛇）