如何从python中的amazon api返回的iframeurl中提取评论?

时间:2014-01-27 19:11:31

标签: python iframe xpath amazon-web-services

我正在尝试使用text获取amazon中给定产品的api评论内容。但我无法解决这个问题。 这就是我所拥有的:

result = api.item_lookup('B00062B6QY', ResponseGroup='Reviews',
     TruncateReviewsAt=256, IncludeReviewsSummary=False)
iframeurl=result.xpath('//*[local-name()="IFrameURL"]/text()')[0].strip()
print iframeurl
reviews=requests.get(iframeurl)
reviews.raise_for_status()
#data = json.loads(reviews.text)
root = ET.fromstring(reviews.text)
print root

输出结果为:

http://www.amazon.com/reviews/iframe?akid=helloworld&alinkCode=xm2&asin=B00062B6QY&atag=welcomehome-20&exp=2014-01-28T19%3A06%3A20Z&summary=0&truncate=256&v=2&sig=HIDDEN%3D
Traceback (most recent call last):
  File "amazon_api_new.py", line 36, in <module>
    root = ET.fromstring(reviews.text)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: mismatched tag: line 867, column 2

PS:我更改了打印出的iframeurl只是为了清除api key详细信息

编辑:来自firebug

的图片enter image description here

1 个答案:

答案 0 :(得分:1)

而不是使用ElementTree,尝试将reviews.text加载到lxml,如:

>>> from lxml import etree
>>> parser = etree.HTMLParser()
>>> tree   = etree.parse(StringIO(reviews.text), parser)

>>> result = etree.tostring(tree.getroot(),
...                         pretty_print=True, method="html")
>>> print(result)
...

当然,您可以使用lxml xpath进一步解析