我一直试图通过让自己成为代理刮刀来学习BeautifulSoup,而且我遇到了一个问题。 BeautifulSoup似乎无法找到任何东西,当打印它解析的内容时,它向我显示:
<html>
<head>
</head>
<body>
<bound 0x7f977c9121d0="" <http.client.httpresponse="" at="" httpresponse.read="" method="" object="" of="">
>
</bound>
</body>
</html>
我已经尝试更改我解析的网站和解析器本身(lxml,html.parser,html5lib),但似乎没有任何改变,无论我做什么,我得到完全相同的结果。这是我的代码,任何人都可以解释错误吗?
from bs4 import BeautifulSoup
import urllib
import html5lib
class Websites:
def __init__(self):
self.header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
def free_proxy_list(self):
print("Connecting to free-proxy-list.net ...")
url = "https://free-proxy-list.net"
req = urllib.request.Request(url, None, self.header)
content = urllib.request.urlopen(req).read
soup = BeautifulSoup(str(content), "html5lib")
print("Connected. Loading the page ...")
print("Print page")
print("")
print(soup.prettify())
答案 0 :(得分:0)
您正在调用urllib.request.urlopen(req).read
,正确的语法是:urllib.request.urlopen(req).read()
您也没有关闭连接,为您解决了这个问题。
打开连接的更好方法是使用with urllib.request.urlopen(url) as req
:语法,因为这会为您关闭连接。
from bs4 import BeautifulSoup
import urllib
import html5lib
class Websites:
def __init__(self):
self.header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
def free_proxy_list(self):
print("Connecting to free-proxy-list.net ...")
url = "https://free-proxy-list.net"
req = urllib.request.Request(url, None, self.header)
content = urllib.request.urlopen(req)
html = content.read()
soup = BeautifulSoup(str(html), "html5lib")
print("Connected. Loading the page ...")
print("Print page")
print("")
print(soup.prettify())
content.close() # Important to close the connection
有关详细信息,请参阅:https://docs.python.org/3.0/library/urllib.request.html#examples