如果我获得了一个网页,例如this,我该如何从<root response="True">
开始并以</root>
结束
我怎么能用Python做到这一点?
答案 0 :(得分:2)
import xml.etree.ElementTree as et
import requests
URL = "http://www.omdbapi.com/?t=True%20Grit&r=XML"
def main():
pg = requests.get(URL).content
root = et.fromstring(pg)
for attr,value in root[0].items():
print("{:>10}: {}".format(attr, value))
if __name__=="__main__":
main()
结果
poster: http://ia.media-imdb.com/images/M/MV5BMjIxNjAzODQ0N15BMl5BanBnXkFtZTcwODY2MjMyNA@@._V1_SX300.jpg
metascore: 80
director: Ethan Coen, Joel Coen
released: 22 Dec 2010
awards: Nominated for 10 Oscars. Another 30 wins & 85 nominations.
year: 2010
genre: Adventure, Drama, Western
imdbVotes: 184,711
plot: A tough U.S. Marshal helps a stubborn young woman track down her father's murderer.
rated: PG-13
language: English
title: True Grit
country: USA
writer: Joel Coen (screenplay), Ethan Coen (screenplay), Charles Portis (novel)
actors: Jeff Bridges, Hailee Steinfeld, Matt Damon, Josh Brolin
imdbID: tt1403865
runtime: 110 min
type: movie
imdbRating: 7.7
答案 1 :(得分:1)
我会使用requests和BeautifulSoup:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get('http://www.omdbapi.com/?t=True%20Grit&r=XML')
>>> soup = BeautifulSoup(r.text)
>>> list(soup('root')[0].children)
[<movie actors="Jeff Bridges, Hailee Steinfeld, Matt Damon, Josh Brolin" awards="Nominated for 10 Oscars. Another 30 wins & 85 nominations." country="USA" director="Ethan Coen, Joel Coen" genre="Adventure, Drama, Western" imdbid="tt1403865" imdbrating="7.7" imdbvotes="184,711" language="English" metascore="80" plot="A tough U.S. Marshal helps a stubborn young woman track down her father's murderer." poster="http://ia.media-imdb.com/images/M/MV5BMjIxNjAzODQ0N15BMl5BanBnXkFtZTcwODY2MjMyNA@@._V1_SX300.jpg" rated="PG-13" released="22 Dec 2010" runtime="110 min" title="True Grit" type="movie" writer="Joel Coen (screenplay), Ethan Coen (screenplay), Charles Portis (novel)" year="2010"></movie>]
答案 2 :(得分:0)
使用urllib2下载文档:http://docs.python.org/2/howto/urllib2.html
一个很好的解析器,简单,简单,格式良好的XML就是这样的Minidom。以下是如何解析: