我在堆栈/文档上寻找一些教程/其他问题但仍无法解决。啊!
发出API请求和解析(想要分配给变量,但这是对这个问题的奖励),这就是我正在尝试的。为什么我不能列出项目的标题和链接?
#!/usr/bin/python
# Screen Scraper for Subs
import urllib
from xml.etree import ElementTree as ET
show = 'heroes'
season = '4'
language = 'en'
limit = '1'
requestURL = 'http://api.allsubs.org/index.php?' \
+ 'search=' + show \
+ '+season+' + season \
+ '&language=' + language \
+ '&limit=' + limit
root = ET.parse(urllib.urlopen(requestURL)).getroot()
print root
print '\n'
items = root.findall('items')
for item in items:
item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/
XML响应
<AllSubsAPI>
<title>AllSubs API: Subtitles Search</title>
<link>http://www.allsubs.org</link>
<description><![CDATA[Subtitles Search for Heroes Season 4]]></description>
<language>en-us</language>
<results>1</results>
<found_results>24</found_results>
<items>
<item>
<title><![CDATA[Heroes Season 4 Subtitles]]></title>
<link>http://www.allsubs.org/subs-download/heroes+season+4/1223435/</link>
<filename>heroes-season-4-english-heroes-season-4-en.zip</filename>
<files_in_archive>Heroes - 4x01-02 - Orientation.HDTV.FQM.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.720p HDTV.DIMENSION.en.srt|Heroes - 4x05 - Hysterical Blindness.720p HDTV.X264.en.srt|Heroes - 4x09 - Shadowboxing.HDTV.LOL.en.srt|Heroes - 4x16 - Pass Fail.HDTV.LOL.en.srt|Heroes - 4x04 - Acceptance.HDTV.en.srt|Heroes - 4x01-02 - Orientation.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.HDTV.NoTV.en.srt|Heroes - 4x10 - Brother's Keeper.HDTV.FQM.en.srt|Heroes - 4x04 - Acceptance.HDTV.FQM.en.srt|Heroes - 4x14 - Let It Bleed.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.720p HDTV.SiTV.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.NoTV.en.srt|Heroes - 4x12 - The Fifth Stage.HDTV.LOL.en.srt|Heroes - 4x19 - Brave New World.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.720p HDTV.DIMENSION.en.srt|Heroes - 4x03 - Ink.720p HDTV.DIMENSION.en.srt|Heroes - 4x11 - Thanksgiving.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.HDTV.LOL.en.srt|Heroes - 4x14 - Let It Bleed.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.HDTV.LOL.en.srt|Heroes - 4x12 - The Fifth Stage.720p HDTV.DIMENSION.en.srt|Heroes - 4x18 - The Wall.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.720p HDTV.CTU.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.CTU.en.srt|Heroes - 4x09 - Shadowboxing.720p HDTV.DIMENSION.en.srt|Heroes - 4x10 - Brother's Keeper.720p HDTV.DIMENSION.en.srt|Heroes - 4x04 - Acceptance.720p HDTV.CTU.en.srt|Heroes - 4x11 - Thanksgiving.HDTV.FQM.en.srt|Heroes - 4x03 - Ink.HDTV.FQM.en.srt|Heroes - 4x05 - Hysterical Blindness.HDTV.XII.en.srt|</files_in_archive>
<languages>en</languages>
<added_on>2010-02-16</added_on>
</item>
</items>
</AllSubsAPI>
更新:
这很有效,感谢帮助并指出了我的错字
items = root.findall('items/item')
for item in items:
print item.find('title').text
print item.find('link').text
答案 0 :(得分:4)
items = root.findall('items')
应该是
items = root.findall('items/item')
答案 1 :(得分:3)
这对我有用。注意我正在使用urllib2来通过代理:
import urllib2
from xml.etree import ElementTree as ET
show = 'heroes'
season = '4'
language = 'en'
limit = '1'
requestURL = 'http://api.allsubs.org/index.php?' \
+ 'search=' + show \
+ '+season+' + season \
+ '&language=' + language \
+ '&limit=' + limit
root = ET.parse(urllib2.urlopen(requestURL)).getroot()
print root
print '\n'
items = root.findall('items')[0].findall('item')
for item in items:
print item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
print item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/
请注意,findall('items')找到“items”标签,你想要循环的(我认为)是其中的“item”标签,所以我们找到那些()。此外,您需要打印才能从python中获取任何内容。
另外,如果我使用limit = 2,我得到一个:
Traceback (most recent call last):
File "heros.py", line 18, in <module>
root = ET.parse(urllib2.urlopen(requestURL)).getroot()
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse
tree.parse(source, parser)
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse
parser.feed(data)
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 24, column 95
我不确定从这个API返回的XML是否格式正确 - 开始时没有“xml”元素。我不相信它......
答案 2 :(得分:2)
您没有迭代'item'元素,实际上是在迭代'items'元素。
我认为应该是:
items = root.findall('items')
childItems = items.findall('item')
for childItem in childItems:
childItem.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
childItem.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435