Element.findall使用XPath和Element Tree给出'无效谓词'

时间:2014-02-21 19:48:37

标签: xml xpath elementtree

我正在尝试使用ElementTree解析SOAP响应,我使用了Suds,我收到了错误:

Traceback (most recent call last):
...
  File "C:\Python27\lib\xml\etree\ElementPath.py", line 263, in iterfind
    selector.append(ops[token[0]](next, token))
  File "C:\Python27\lib\xml\etree\ElementPath.py", line 224, in prepare_predicate
    raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate

我正在使用看起来像这样的XML:

<sitesResponse>
    <queryInfo></queryInfo>
    <site>
        <siteInfo>
            <siteName>name</siteName>
        </siteInfo>
    </site>
    <site />
    <site />
    <site />
     ....
</sitesResponse>

... 我的目标是从每个XML中访问“name”(节点),并将其放在一个列表中我的代码如下所示:

from suds.client import Client
import xml.etree.ElementTree as ET
url="http://worldwater.byu.edu/interactive/dr/services/index.php/services/cuahsi_1_1.asmx?WSDL"
def getNames(url):
    client = Client(url,cache=None)
    response = client.service.GetSites()
    response_string=str(response)
    root=ET.fromstring(response_string)
    names=[]
    for i in root.findall(".//siteName[*]"):
        name=sites.find(".//siteName[i]/*").text
        names.append(name)
    return names

names_list= getNames(url)
names_list.sort()
for i in names_list:
    print names_list[i]

2 个答案:

答案 0 :(得分:0)

您可以使用以下内容:

for sitename in root.findall(".//siteName"):
    names.append(sitename.text)

答案 1 :(得分:0)

感谢您的帮助!事实证明,问题是我需要考虑命名空间。我还使用Eugene提出的想法改进了代码,使其成为一个模块。

from suds.client import Client
import xml.etree.ElementTree as ET

def getNames(url,namespace):
    ###Suds Call###
    client = Client(url,cache=None)
    response = client.service.GetSites()
    ###         ###

    response_string=str(response)

    ###ElementTree Parsing###
    root=ET.fromstring(response_string)
    siteNameTags = root.findall("{0}site/{0}siteInfo/{0}siteName".format(namespace)) #must include {0} due to namespacing (this is where I need to add generality)
    ###                   ###

    siteNames=[]
    for i in siteNameTags:
        siteNames.append(i.text)
    siteNames.sort()
    return siteNames

###Example###    
url="http://worldwater.byu.edu/interactive/dr/services/index.php/services/cuahsi_1_1.asmx?WSDL"        
namespace="{http://www.cuahsi.org/waterML/1.1/}"
names_list= getNames(url,namespace)
for i in names_list:
    print ("{0} ".format(i))  #95% sure this is necessary because of the namespacing