BeautifulSoup有错误的回应

时间:2014-07-15 21:54:17

标签: python html beautifulsoup html-parsing

我正试着用BS弄湿我的脚。 我试图通过文档工作,但在我遇到问题的第一步。

这是我的代码:

from bs4 import BeautifulSoup
soup = BeautifulSoup('https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5....1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description')

print(soup.prettify())

这是我得到的回复:

Warning (from warnings module):
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/bs4/__init__.py", line 189
'"%s" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an     
HTTP client to get the document behind the URL, and feed that document to Beautiful Soup.' % markup)
UserWarning: "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description" 
looks like a URL. Beautiful Soup is not an HTTP client. You should 
probably use an HTTP client to get the document behind the URL, and feed that document    
to Beautiful Soup.
https://api.flickr.com/services/rest/?method=flickr.photos.search&api;_key=5...b&per;_page=250&accuracy;=1&has;_geo=1&extras;=geo,tags,views,description

是因为我试着打电话给http ** s **还是另一个问题? 谢谢你的帮助!

2 个答案:

答案 0 :(得分:12)

您将URL作为字符串传递。相反,您需要通过urllib2requests获取页面来源:

from urllib2 import urlopen  # for Python 3: from urllib.request import urlopen
from bs4 import BeautifulSoup

URL = 'https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5....1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description'
soup = BeautifulSoup(urlopen(URL))

请注意,您不需要在urlopen()的结果上调用read()BeautifulSoup允许第一个参数为类文件对象,{{1} }返回一个类似文件的对象。

答案 1 :(得分:2)

错误说明了所有内容,您将URL传递给Beautiful Soup。您需要先获取网站内容,然后才将内容传递给BS。

要下载内容,您可以使用urlib2

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

以后

soup = BeautifulSoup(html)