我正在尝试使用lxml来返回标记内的文本<ImageSet><LargeImage><URL>this text</URL></LargeImage></ImageSet>
我的代码只返回每个标记下的文本的无。
这是我的代码:
# I am trying to get the URL text using lxml
for attr_list in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
for image_list in tree.find(".//"+settings.AMAZON_NS+"LargeImage"):
print(etree.tostring(image_list))
print(image_list.findtext(".//"+settings.AMAZON_NS+"URL")) # This is only printing None.
以下是代码输出:
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
None
<Width xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">349</Width>
None
<URL xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01">http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg</URL>
None
<Height xmlns="http://webservices.amazon.com/AWSECommerceService/2009-10-01" Units="pixels">500</Height>
第11,17,23行......应该显示一个URL而不是None。
编辑1:让我试着澄清我的上述问题......
这是我正在使用的代码:
for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
print(etree.tostring(image_set))
这是我得到的输出: http://dpaste.com/289187/
如何专门获取URL标记内的内容?
我尝试了以下(但是没有一个可以工作,但也许你们可以看到我试图通过失败的尝试做的一般想法):
for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
for image_url_set in image_set.find(".//"+settings.AMAZON_NS+"URL"):
print(etree.tostring(image_url_set))
这是我得到的错误:
for image_set.find中的image_url_set(“.//”+ settings.AMAZON_NS +“URL”): TypeError:'NoneType'对象不可迭代
for item in tree.iterfind(".//"+settings.AMAZON_NS+"ImageSet"):
for image_set in item.find(".//"+settings.AMAZON_NS+"LargeImage"):
for image_link in image_set.iter(".//"+settings.AMAZON_NS+"URL"):
print(image_link.text)
甚至没有打印出来。
答案 0 :(得分:1)
from cStringIO import StringIO
from lxml import etree
URL_TAG = "{http://webservices.amazon.com/AWSECommerceService/2009-10-01}URL"
tree = etree.fromstring(body)
print tree.findtext(".//%s" % (URL_TAG,)) # 1st way
for ev, el in etree.iterparse(StringIO(body), tag=URL_TAG): # 2nd approach
print el.text
body
是你的xml文本。
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
http://ecx.images-amazon.com/images/I/51dSYJcTaTL.jpg
答案 1 :(得分:0)
尝试替换
print(image_list.findtext(".//"+settings.AMAZON_NS+"URL"))
只是
print(image_list.text)