API响应:http://iss.ndl.go.jp/api/opensearch?isbn=9784334770051 您好,感谢您昨天的帮助。 但是,当我尝试从Elements获取值时,我总是将空值作为响应。 我被评为link但不确定我理解它。 我哪里错了,有空值?
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
import sys
import urllib
import urllib2
import re, pprint
from xml.etree.ElementTree import *
import csv
from xml.dom import minidom
import xml.etree.ElementTree as ET
import shelve
import subprocess
errorCheck = "0"
isbn = raw_input("Enter IBSN Number Please ")
isIsbn = len(isbn)
# ElementTree requires namespace definition to work with XML with namespaces correctly
# It is hardcoded at this point, but this should be constructed from response.
namespaces = {
'dc': 'http://purl.org/dc/elements/1.1/',
'dcndl': 'http://ndl.go.jp/dcndl/terms/',
}
# for prefix, uri in namespaces.iteritems():
# ElementTree.register_namespace(prefix, uri)
if isIsbn == 10 or isIsbn == 13:
errorCheck = 1
url = "http://iss.ndl.go.jp/api/opensearch?isbn=%s" % isbn
req = urllib2.Request(url)
response = urllib2.urlopen(req)
tree = ET.parse(response)
root = tree.getroot()
# root = ET.fromstring(XmlData)
print root.findall('dc:title', namespaces)
print root.findall('dc:title')
print root.findall('dc:identifier', namespaces)
print root.findall('dc:identifier')
print root.findall('identifier')
if errorCheck == "0":
print "It is not ISBN"
# print(root.tag,root.attrib)
# for child in root.find('.//item'):
# print child.text
答案 0 :(得分:0)
您的代码需要稍加修改,在 findall 调用中将.//
添加到您的表达式,根节点是 rss 节点和 dc:title' s 的后代不是 rss 节点的直接子节点,因此您需要搜索doc:
import xml.etree.ElementTree as ET
import requests
url = "http://iss.ndl.go.jp/api/opensearch?isbn=9784334770051"
tree = ET.fromstring(requests.get(url).content)
namespaces = {
'dc': 'http://purl.org/dc/elements/1.1/',
'dcndl': 'http://ndl.go.jp/dcndl/terms/',
}
[t.text for t in tree.findall('.//dc:title', namespaces)]
[i.text for i in tree.findall('.//dc:identifier', namespaces)]
您可以使用 lxml 轻松完成,它可以为您映射命名空间并获取源代码:
In [1]: import lxml.etree as et
In [2]: url = "http://iss.ndl.go.jp/api/opensearch?isbn=9784334770051"
In [3]: tree = et.parse(url)
In [4]: nsmap = tree.getroot().nsmap
In [5]: print(tree.xpath("//dc:title/text()", namespaces=nsmap))
[u'\u9244\u8155\u30a2\u30c8\u30e0']
In [6]: print(tree.xpath("//dc:identifier/text()", namespaces=nsmap))
['4334770053', '95078560']
您可以看到其中一个dc:titles:
的路径In [55]: tree
Out[55]: <Element 'rss' at 0x7f996e8b66d0> # root
In [56]: tree.findall('channel') # child of root so don't need .//
Out[56]: [<Element 'channel' at 0x7f996e131990>]
In [57]: tree.findall('channel/item/dc:title', namespaces) # item is a descendant of rss, item is parent of the dc:title
Out[57]: [<Element '{http://purl.org/dc/elements/1.1/}title' at 0x7f996e131910>]
与标识符相同:
In [58]: tree.findall('channel//item//dc:identifier', namespaces)
Out[58]:
[<Element '{http://purl.org/dc/elements/1.1/}identifier' at 0x7f996e131c50>,
<Element '{http://purl.org/dc/elements/1.1/}identifier' at 0x7f996e131250>]